当前位置:网站首页>RF, gbdt, xgboost feature selection methods "recommended collection"
RF, gbdt, xgboost feature selection methods "recommended collection"
2022-07-25 20:04:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
RF、GBDT、XGboost Can do feature selection , It belongs to the embedded method in feature selection . For example sklearn in , You can use attributes feature_importances_ To see the importance of features , such as :
from sklearn import ensemble
#grd = ensemble.GradientBoostingClassifier(n_estimators=30)
grd = ensemble.RandomForestClassifier(n_estimators=30)
grd.fit(X_train,y_train)
grd.feature_importances_But how do these three classifiers calculate the importance of features ? Let's explain it separately .
1. Random forests (Random Forest)
Use data outside the bag (OOB) Make predictions . Random forest in each re sampling to establish decision tree , There will be some samples that are not selected , Then these samples can be used for cross validation , This is also one of the advantages of random forest . It can avoid cross validation , Direct use oob _score_ To evaluate the performance of the model .
The specific method is :
1. For every decision tree , use OOB Calculate the data error outside the bag , Write it down as errOOB1;
2. Then randomly pair OOB The characteristics of all samples i Add noise interference , Calculate the data error outside the bag again , Write it down as errOOB2;
3. Suppose there is N tree , features i The importance of sum(errOOB2-errOOB1)/N;
If random noise is added , The accuracy of out of pocket data has decreased significantly , It shows that this feature has a great impact on the prediction results , Then it shows that its importance is relatively high
2. Gradient lifting tree (GBDT)
Mainly by calculating the characteristics i The average value of importance in a single tree , The calculation formula is as follows :
among ,M Is the number of trees . features i The importance of a single tree is mainly calculated according to this feature i The reduction in loss after splitting
among ,L Is the number of leaf nodes ,L-1 Is the number of non leaf nodes .
3. XGboost
XGboost It is calculated by the sum of the number of splits in each tree , For example, this feature splits in the first tree 1 Time , Second tree 2 Time ……, Then the score of this feature is (1+2+…).
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/127541.html Link to the original text :https://javaforall.cn
边栏推荐
- PMP每日一练 | 考试不迷路-7.25
- wallys//IPQ5018/IPQ6010/PD-60 802.3AT Input Output 10/100/1000M
- Analysis of CMS station building system of common PHP in China
- 谷歌Pixel 6a屏下指纹扫描仪存在重大安全漏洞
- Distributed link logging minbox logging usage document
- [Infographics Show] 248 Public Domain Name
- How does tiktok break zero?
- Split very long line of words into separate lines of max length
- wallys//wifi6 wifi5 router IPQ6018 IPQ4019 IPQ4029 802.11ax 802.11ac
- Recommendations on how to install plug-ins and baby plug-ins in idea
猜你喜欢

Day7: ordered binary tree (binary search tree)

「分享」DevExpress ASP.NET v22.1最新版本系统环境配置要求

03-树1 树的同构

Cloud native guide: what is cloud native infrastructure

PreScan快速入门到精通第十八讲之PreScan轨迹编辑的特殊功能

4、Nacos 配置中心源码解析之 服务端启动

PMP practice once a day | don't get lost in the exam -7.25

A high efficiency 0-delay 0-copy QT player scheme based on Hisilicon 3559

Mutual conversion of camera internal parameter matrix K and FOV

How to ensure the quality of customized slip rings
随机推荐
[good book recommendation] - authoritative guide to Ethernet (2nd Edition)
Gbase 8s UDR memory management_ 03_ mi_ realloc
DIY个人服务器(diy存储服务器)
Siemens PLM Teamcenter download, installation and use tutorial
[cloud native | learn kubernetes from scratch] VIII. Namespace resource quotas and labels
Share 25 useful JS single line codes
Stochastic gradient descent method, Newton method, impulse method, adagrad, rmsprop and Adam optimization process and understanding
Gbase 8s UDR memory management_ 02_ mi_ dalloc
Binarysearch basic binary search
Cloud native guide: what is cloud native infrastructure
How to get started quickly in software testing
Stock software development
Do you still have certificates to participate in the open source community?
笔记——记录一个CannotFindDataSourceException: dynamic-datasource can not find primary datasource问题解决
[mindspore] [read graph data] cannot read mindrecord format graph data
How to ensure the quality of customized slip rings
Day7: ordered binary tree (binary search tree)
Recommendations on how to install plug-ins and baby plug-ins in idea
Rainbow plug-in extension: monitor MySQL based on MySQL exporter
C # add multi line and multi column text watermark in word