当前位置：网站首页>Random forest learning notes

Random forest learning notes

2022-06-21 20:39:00 【Running white gull】

introduction

Random forest is an integrated learning algorithm , Select the best model method through multiple classification methods .

Integrated learning algorithms ： Integrate multiple evaluators （estimator) The results of modeling , After the summary, a comprehensive result is obtained , In order to obtain better regression or classification performance than a single model .

Common ensemble learning algorithms include random forest 、 Gradient lifting tree （GBDT）,Xgboost.

Types of integration algorithms

Bagging: Build multiple models , Predict independently of each other , The integrated evaluator is determined by averaging the prediction results or by majority voting （ensemble estimator） Result . The typical representative is random forest .

Boosting： First, use a model to predict , Then for the samples with wrong prediction results , In the next prediction model, a higher weight will be given to predict . The core idea is to integrate multiple models with weak evaluation effect , Continuously strengthen to form a powerful model . The representative model is Adaboost And gradient lifting trees .

Integrated algorithm library

class	function
ensemble.AdaBoostClassifier	AdaBoost classification
ensemble.AdaBoostRegressor	AdaBoost Return to
ensemble.BaggingClassifier	Bagging classifier
ensemble.BaggingRegressor	Bagging returner
ensemble.ExtraTreesClassifier	Extra-trees classification
ensemble.ExtraTreesRegressor	Extra-trees Return to
ensemble.GradientBoostingClassifier	Gradient lifting classification
ensemble.GradientBoostingRegressor	Gradient ascension regression
ensemble.IsolationForest	Isolate the forest
ensemble.RandomForestClassifier	Random forest classification
ensemble.RandomForestRegressor	Random forest regression
ensemble.RandomForestEmbedding	Complete random tree integration
ensemble.VotingForestClassifier	For soft voting that is not suitable for estimators / Most rule classifiers

RandomForest Modeling process

1. Instantiation , Establish evaluation model object

2. Train the model through the model interface

3. Extract the required information through the model interface

from sklearn.ensemble import RandomForestClassifier 

rfc = RandomForestClassifier
rfc = rfc.fit(X_train, y_train)
result = rfc.score(x_test,y_test)

The function prototype

class.sklearn.ensemble.RandomForestClassifier(n_estinators=‘10’,criterion=‘gini’,max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features=‘auto’,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=True,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,class_weight=None)

Parameter Introduction

Most and decision trees （Decision Tree) identical .

Parameters
n_estimators	The number of trees in the forest , Usually the larger the quantity , The better the result. , But the calculation time will also increase . When the number of trees exceeds a critical value , The effect of the algorithm will not be significantly better .
max_features	The size of the random subset of features considered when dividing nodes . The lower the value , The more the variance decreases , But the more the deviation increases . Based on experience , Used in regression problems `max_features = None` （ Always consider all the characteristics ）, Classification questions use `max_features = "sqrt"` （ Random consideration `sqrt(n_features)` features , among `n_features` It's the number of features ） Is a good default .
max_depth min_samples_split	`max_depth = None` and `min_samples_split = 2` The combination usually has a good effect （ That is, to generate a complete tree ）
bootstrap	In random forests , The self-service sampling method is used by default （`bootstrap = True`）, However extra-trees The default policy for is to use the entire dataset （`bootstrap = False`）. When self-service sampling method is used , The generalization accuracy can be estimated from the remaining or out of pocket samples , Set up `oob_score = True` That is to say .