当前位置:网站首页>Random forest learning notes

Random forest learning notes

2022-06-21 20:39:00 Running white gull

introduction

Random forest is an integrated learning algorithm , Select the best model method through multiple classification methods .

Integrated learning algorithms : Integrate multiple evaluators (estimator) The results of modeling , After the summary, a comprehensive result is obtained , In order to obtain better regression or classification performance than a single model .

Common ensemble learning algorithms include random forest 、 Gradient lifting tree (GBDT),Xgboost.

Types of integration algorithms

Bagging: Build multiple models , Predict independently of each other , The integrated evaluator is determined by averaging the prediction results or by majority voting (ensemble estimator) Result . The typical representative is random forest .

Boosting: First, use a model to predict , Then for the samples with wrong prediction results , In the next prediction model, a higher weight will be given to predict . The core idea is to integrate multiple models with weak evaluation effect , Continuously strengthen to form a powerful model . The representative model is Adaboost And gradient lifting trees .

Integrated algorithm library

class function
ensemble.AdaBoostClassifierAdaBoost classification
ensemble.AdaBoostRegressorAdaBoost Return to
ensemble.BaggingClassifier Bagging classifier
ensemble.BaggingRegressor Bagging returner
ensemble.ExtraTreesClassifierExtra-trees classification
ensemble.ExtraTreesRegressorExtra-trees Return to
ensemble.GradientBoostingClassifier Gradient lifting classification
ensemble.GradientBoostingRegressor Gradient ascension regression
ensemble.IsolationForest Isolate the forest
ensemble.RandomForestClassifier Random forest classification
ensemble.RandomForestRegressor Random forest regression
ensemble.RandomForestEmbedding Complete random tree integration
ensemble.VotingForestClassifier For soft voting that is not suitable for estimators / Most rule classifiers

RandomForest Modeling process

1. Instantiation , Establish evaluation model object

2. Train the model through the model interface

3. Extract the required information through the model interface

from sklearn.ensemble import RandomForestClassifier 

rfc = RandomForestClassifier
rfc = rfc.fit(X_train, y_train)
result = rfc.score(x_test,y_test)

The function prototype

class.sklearn.ensemble.RandomForestClassifier(n_estinators=‘10’,criterion=‘gini’,max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features=‘auto’,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=True,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,class_weight=None)

Parameter Introduction

Most and decision trees (Decision Tree) identical .

Parameters
n_estimators

The number of trees in the forest , Usually the larger the quantity , The better the result. , But the calculation time will also increase .

When the number of trees exceeds a critical value , The effect of the algorithm will not be significantly better . 

max_features

The size of the random subset of features considered when dividing nodes .

The lower the value , The more the variance decreases , But the more the deviation increases . Based on experience , Used in regression problems  max_features = None ( Always consider all the characteristics ), Classification questions use  max_features = "sqrt" ( Random consideration  sqrt(n_features)  features , among  n_features  It's the number of features ) Is a good default .

max_depth

min_samples_split

max_depth = None  and  min_samples_split = 2  The combination usually has a good effect ( That is, to generate a complete tree )
bootstrap In random forests , The self-service sampling method is used by default (bootstrap = True), However extra-trees The default policy for is to use the entire dataset (bootstrap = False). When self-service sampling method is used , The generalization accuracy can be estimated from the remaining or out of pocket samples , Set up  oob_score = True  That is to say .

原网站

版权声明
本文为[Running white gull]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206211853335060.html