当前位置:网站首页>Decision tree learning notes
Decision tree learning notes
2022-06-21 20:39:00 【Running white gull】
Types of decision trees
modular sklearn.tree
tree.DecisionTreeClassifier: Classification tree
tree.DecisionTreeRegressor: Back to the tree
tree.export.graphviz: Export the generated decision tree as DOT Format , Special for drawing , The decision tree can be visualized
tree.ExtraTreeClassifier: High random version of the classification tree
tree.ExtraTreeRegressor
The process of decision tree modeling
The decision tree filters out the final decision tree through classifier integration : Several random features are selected for decision tree classification , After getting multiple decision trees, the fitting effect is the best (score The highest ) A decision tree model of .
The classification of each node depends on entropy and gini Coefficient to divide nodes , After calculating the coefficient, the impurity is obtained , The lower the purity, the more uniform the labels of the classified data in this node . for example , There are apples 、 pears 、 Banana , This node is classified as apple , But there are still a few pears and bananas inside . When the impurity is 0 when , It means that all records have the same label , leaf (leaf) The impurity of the node is 0.
Related codes :
Modeling code
clf = tree.DecisionTreeClassifier(criterion='entropy'[ Default gini]
,random_state=30[ Select features according to certain rules , Features are not randomly selected , Entering the same value will generate the same tree 】
,splitter='random'【 That controls the random selection of features , It is divided into random and best,best When randomly selecting features for the decision tree, more important features will be selected for branching ,random It will be more random when branching , The fitting of selected practice level will be reduced , When over fitting, this method is often used to reduce the fitting 】
,max_depth=3【 The maximum number of layers of the tree ( It doesn't contain root layer ), If it is too large, it will over fit , Too young underfit
,min_samples_leaf=10【 The child nodes of a node contain at least 10 Each training sample , Too big will underfit, Too small will over fit 】
,min_samples_split=10【 The node contains at least 10 Each node 】)
clf=clf.fit(Xtrain,Ytrain)
score=clf.score(Xtest,Ytest)
Visual code
import graphviz
feature_name=['a','b','c']
dot_data=tree.export_graphviz(clf, feature_name=feature_name,class_names=['a','b','v'],filled=True,rounded=True)
graph=graphviz.Source(dot_data)
notes :feature_name Is a feature of a dataset ,class_names Is the label of the record ,filled Is it color or not .
# Decision tree view feature importance
clf.feature_importance_
[zip(feature_name,clf.feature_importances_)]
Parameter selection
import matplotlib.pyplot as plt
test=[]
for i in range(10):
clf=tree.DecisionTreeClassifier(max_depth=i+1
,criterion = "entropy"
,random_state=30)
clf=clf.fit(Xtrain,Ytrain)
score=clf.score(Xtest,Ytest)
test.append(score)
plt.plot(range(1,11),test,color='red',label='max_depth')
plt.legend()
plt.show()
Iterate over the parameters , Choose the best parameters , Not only is it max_depth, You can also select other parameters .
Important interfaces
clf.apply(Xtest)# Return the leaf node index of each sample
clf.predict(Xtest)# Return the classification regression results of each sample
边栏推荐
- 高等代数_第9章:线性映射
- 全局负载均衡实现原理
- How MySQL implements grouping sum
- Comment MySQL additionne les colonnes
- Details, MySQL_ DATE_ FORMAT()_ Functions_ Detailed explanation (remember to collect)
- 带你区分几种并行
- Snake game project full version
- Harbor high availability cluster design and deployment (practice + video), based on offline installation
- Flutter AutomaticKeepAliveClientMixin缓存组件
- Analysis of ${} string splicing in JS
猜你喜欢

Netcore3.1 Ping whether the network is unblocked and obtaining the CPU and memory utilization of the server

黄金哪些值得注意的技术:资金管理的重要性

How to query the maximum ID value in MySQL

Anfulai embedded weekly report (issue 270): June 13, 2022 to June 19, 2022

History of the Great Game

The highest monthly salary is 17k. As long as there is a field of hope in your heart, hard work will usher in a green land~

全局负载均衡实现原理

The difference between break and continue

Harbor high availability cluster design and deployment (practice + video), based on offline installation

點雲轉深度圖:轉化,保存,可視化
随机推荐
Envi classic annotation object how to recall modification and deletion of element legend scale added
Category
Quartus II 18.0软件安装包和安装教程
FS9935 高效率恒流限流 WLED 驱动IC
Harbor high availability cluster design and deployment (practice + video), based on offline installation
【基于合泰HT32F52352的智慧垃圾桶总结】
Analysis of ${} string splicing in JS
集成公告|Anima协议上线Moonbeam
决策树(Decision Tree)学习笔记
《跟老卫学 HarmonyOS 开发》:以父之名·码力全开!写段HarmonyOS祝父亲节
SD6.20集训总结
Gartner 网络研讨会 “九问数字化转型” 会后感
粗读Targeted Supervised Contrastive Learning for Long-Tailed Recognition
Zabbix6.0+timescaledb+ enterprise wechat alarm
Can the financial product be redeemed on the due date?
Pfsense configuring tinc site to site tunneling tutorial
Sd6.20 summary of intensive training
Now CDC supports MySQL 5 What time is it? Previously, it seemed that it was 5.7. Today, it is found that the MySQL data source of 5.6 can also be updated in real time
Big Fish eating Little Fish Games version complète
Integration announcement | animation protocol launch moonbeam