当前位置:网站首页>Random forest learning notes
Random forest learning notes
2022-06-21 20:39:00 【Running white gull】
introduction
Random forest is an integrated learning algorithm , Select the best model method through multiple classification methods .
Integrated learning algorithms : Integrate multiple evaluators (estimator) The results of modeling , After the summary, a comprehensive result is obtained , In order to obtain better regression or classification performance than a single model .
Common ensemble learning algorithms include random forest 、 Gradient lifting tree (GBDT),Xgboost.
Types of integration algorithms
Bagging: Build multiple models , Predict independently of each other , The integrated evaluator is determined by averaging the prediction results or by majority voting (ensemble estimator) Result . The typical representative is random forest .
Boosting: First, use a model to predict , Then for the samples with wrong prediction results , In the next prediction model, a higher weight will be given to predict . The core idea is to integrate multiple models with weak evaluation effect , Continuously strengthen to form a powerful model . The representative model is Adaboost And gradient lifting trees .
Integrated algorithm library
| class | function |
| ensemble.AdaBoostClassifier | AdaBoost classification |
| ensemble.AdaBoostRegressor | AdaBoost Return to |
| ensemble.BaggingClassifier | Bagging classifier |
| ensemble.BaggingRegressor | Bagging returner |
| ensemble.ExtraTreesClassifier | Extra-trees classification |
| ensemble.ExtraTreesRegressor | Extra-trees Return to |
| ensemble.GradientBoostingClassifier | Gradient lifting classification |
| ensemble.GradientBoostingRegressor | Gradient ascension regression |
| ensemble.IsolationForest | Isolate the forest |
| ensemble.RandomForestClassifier | Random forest classification |
| ensemble.RandomForestRegressor | Random forest regression |
| ensemble.RandomForestEmbedding | Complete random tree integration |
| ensemble.VotingForestClassifier | For soft voting that is not suitable for estimators / Most rule classifiers |
RandomForest Modeling process
1. Instantiation , Establish evaluation model object
2. Train the model through the model interface
3. Extract the required information through the model interface
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier
rfc = rfc.fit(X_train, y_train)
result = rfc.score(x_test,y_test)The function prototype
class.sklearn.ensemble.RandomForestClassifier(n_estinators=‘10’,criterion=‘gini’,max_depth=None,min_samples_split=2,min_samples_leaf=1,min_weight_fraction_leaf=0.0,max_features=‘auto’,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,bootstrap=True,oob_score=False,n_jobs=None,random_state=None,verbose=0,warm_start=False,class_weight=None)
Parameter Introduction
Most and decision trees (Decision Tree) identical .
| Parameters | |
| n_estimators | The number of trees in the forest , Usually the larger the quantity , The better the result. , But the calculation time will also increase . When the number of trees exceeds a critical value , The effect of the algorithm will not be significantly better . |
| max_features | The size of the random subset of features considered when dividing nodes . The lower the value , The more the variance decreases , But the more the deviation increases . Based on experience , Used in regression problems |
max_depth min_samples_split | max_depth = None and min_samples_split = 2 The combination usually has a good effect ( That is, to generate a complete tree ) |
| bootstrap | In random forests , The self-service sampling method is used by default (bootstrap = True), However extra-trees The default policy for is to use the entire dataset (bootstrap = False). When self-service sampling method is used , The generalization accuracy can be estimated from the remaining or out of pocket samples , Set up oob_score = True That is to say . |
边栏推荐
猜你喜欢

Advanced algebra_ Chapter 9: linear mapping

Problems caused by redis caching scenario

How MySQL sums columns

Implementation principle and application practice of Flink CDC mongodb connector

Big Fish eating Little Fish Games version complète

In May, I just came back from the Ali software testing post. I worked for Alibaba P7 at 3+1, with an annual salary of 28*15

Harbor high availability cluster design and deployment (practice + video), based on offline installation

Redis 做缓存场景引发的问题

IAR major upgrade, support vs code, St release the first sensor with processing unit

TX9116同步升压ic
随机推荐
Zabbix6.0+timescaledb+ enterprise wechat alarm
Point cloud to depth map: conversion, saving, visualization
How to query the data in MySQL
Oracle Flashback和RMAN示例
【微信小程序】协同工作和发布 数据绑定
mysql如何對列求和
Category
1157 Anniversary
高等代数_第9章:线性映射
零售数字化起锚阶段,更多地关注的是如何借助数字化的手段对流量进行挖掘和转化
Comment MySQL additionne les colonnes
机器学习和模式识别怎么区分?
It is said that the price of the iPhone 14 will rise; TikTok US user data is transferred to Oracle, and bytes cannot be accessed; Seatunnel 2.1.2 releases geek headlines
阿里云 ACK One、ACK 云原生 AI 套件新发布,解决算力时代下场景化需求
jmeter线程持续时间
起飞,年薪40万+
运维监控数据可视化-让数据自己会说话[华汇数据]
Taoist Zhang Zhishun's self narration
Cocoapods安装(Xcode8.0之后,无限卡在Setting up CocoaPods master repo)
软件测试办公工具推荐-桌面日历