当前位置:网站首页>Mpai data science platform random forest classification \ explanation of regression parameter adjustment
Mpai data science platform random forest classification \ explanation of regression parameter adjustment
2022-06-25 12:05:00 【Halosec_ Wei】
Number of decision trees (n_estimators):
This is the number of trees in the forest , That is, the number of base evaluators . The effect of this parameter on the accuracy of the stochastic forest model is monotonic , The larger the number of decision trees , Models tend to work better . But the corresponding , Any model has a decision boundary after the number of decision trees reaches a certain degree , The accuracy of random forests is often not rising or beginning to fluctuate , also , The larger the number of decision trees , The larger the amount of computation and memory required , The training time will be longer and longer . For this parameter , We are eager to strike a balance between the difficulty of training and the effect of the model , The number of decision trees is usually no more than 1000.
Value :【1,+∞】
The principle of division (criterion):
Return to : Regression tree is an indicator of branch quality , Supported standards are 2 Kind of :MAE,MSE( The specific formula is self-contained );
classification :CART The evaluation criteria of tree division on features , Supported standards are 2 Kind of ,: gini index (Gini), Information gain (entropy);
Maximum depth of decision tree (max_depth):
The default value means that the decision tree will not limit the depth of the subtree when building the optimal model . If the sample size of the model is large , When there are many features , It is recommended to limit the maximum depth ; If the sample size is small or the characteristics are small , The maximum depth is not limited ,max depth Usually no more than 50.
Value :【1,+∞】
Splitting an internal node requires a small number of samples (min_samples_split):
Integer or floating point , The default is 2. It specifies to split an internal node ( Nonleaf node ) Minimum number of samples required . This value limits the conditions for the continued division of the subtree , If the number of samples of a node is less than min_samples_split, Then we will not continue to try to select the best feature for classification . The default is 2. If the sample size is small , You don't need to worry about this value . If the sample size is very large , It is recommended to increase this value
Value :【2,+∞】
The minimum number of samples required for each leaf node (min_samples_leaf):
This value limits the minimum number of samples for leaf nodes , If the number of leaf nodes is less than the number of samples , Will be pruned together with brother nodes . The default is 1, An integer that can enter the minimum number of samples , Or the minimum number of samples as a percentage of the total number of samples . If the sample size is small , You don't need to worry about this value . If the sample size is very large , It is recommended to increase this value .
Value :【1,+∞】
The number of features to consider when searching for the optimal partition of nodes (max_features):
When selecting the optimal attribute, the divided characteristics cannot exceed this value , When it is an integer , That is, the maximum characteristic number ; When decimal , Number of training set features * decimal ; auto when max_features=sqrt(n_features).
Value :(0,1】
Maximum number of leaf nodes (max_leaf_nodes):
By limiting the maximum number of leaf nodes , Can prevent over fitting , The default is "None”, That is, the maximum number of leaf nodes is not limited . If there are restrictions , The algorithm will establish the optimal decision tree within the maximum number of leaf nodes . If there are not many features , This value can be ignored , But if the features are divided into many parts , Can be limited , Specific values can be obtained through cross validation
Value :(0,1】
Information entropy or Gini coefficient impurity threshold (min_impurity_split):
This value limits the growth of the decision tree , If the impurity of a node ( Based on Gini coefficient , Mean square error ) Less than this threshold , Then the node is not regenerated to a child node . Leaf node . It is generally not recommended to change the default value 1e-7.
Value :(0,1】
There is a sample put back (bootstrap:)
seeing the name of a thing one thinks of its function , That is to say, whether there is a sampling of the land to be put back when building a decision tree for a random forest , The default is True, That is to say, the strategy of "put back sampling" is adopted
Value : Yes 、 nothing
Out of bag estimation (oob_score):,
bagging The random sampling method is adopted to establish the tree model , So those sample sets that have not been extracted , That is, the data set that is not involved in establishing the tree model is the data set outside the bag , This data set can be used to verify the effect of the model , Parameter training of multiple models , We know that cross validation can be used to , But it takes a lot of time , And there is no great need for random forest , So we use this data to verify the decision tree model , It's a simple cross validation . Low performance consumption , But the effect is good . The default value is False.
Value : Yes 、 nothing
边栏推荐
- SDN系统方法 | 9. 接入网
- 机器学习自学成才的十条戒律
- What are redis avalanche, penetration and breakdown?
- 做自媒体视频需要怎么做才能年收入一百万?
- 分享7个神仙壁纸网站,让新的壁纸,给自己小小的雀跃,不陷入年年日日的重复。
- Share 7 immortal wallpaper websites, let the new wallpaper give you a little joy, and don't fall into the repetition year after year.
- plt. GCA () picture frame and label
- Solution to the timeout scenario of Flink streaming computing (official live broadcast)
- 兴业证券是国企吗?在兴业证券开户资金安全吗?
- Nacos installation and use
猜你喜欢

一套自动化无纸办公系统(OA+审批流)源码:带数据字典

redis的dict的扩容机制(rehash)

VFP develops a official account to receive coupons, and users will jump to various target pages after registration, and a set of standard processes will be sent to you

Actual combat summary of Youpin e-commerce 3.0 micro Service Mall project

Idea local launch Flink task

Why can't you Ping the website but you can access it?

揭秘GaussDB(for Redis):全面对比Codis

plt. GCA () picture frame and label

Startups must survive

Eureka accesses the console and reports an error: whitelabel error page
随机推荐
ROS 笔记(06)— 话题消息的定义和使用
Mui scroll bar recovery
Dark horse shopping mall ---3 Commodity management
Is the online stock trading account opening ID card information safe?
flutter常用命令及问题
Nacos installation and use
数据库系列:MySQL索引优化总结(综合版)
What should I do to dynamically add a column and button to the gird of VFP?
现在网上炒股开户身份证信息安全吗?
为什么ping不通网站 但是却可以访问该网站?
分享7个神仙壁纸网站,让新的壁纸,给自己小小的雀跃,不陷入年年日日的重复。
Comment TCP gère - t - il les exceptions lors de trois poignées de main et de quatre vagues?
R语言caTools包进行数据划分、scale函数进行数据缩放、e1071包的naiveBayes函数构建朴素贝叶斯模型
The idea of mass distribution of GIS projects
客户经理的开户二维码开户买股票安全吗?有谁知道啊
属性分解 GAN 复现 实现可控人物图像合成
Gradle知识点
RecyclerView滚动到指定位置
Flink partition policy
JS judge whether a number is in the set