当前位置:网站首页>Decision tree learning notes

Decision tree learning notes

2022-06-21 20:39:00 Running white gull

Types of decision trees

modular sklearn.tree

tree.DecisionTreeClassifier: Classification tree

tree.DecisionTreeRegressor: Back to the tree

tree.export.graphviz: Export the generated decision tree as DOT Format , Special for drawing , The decision tree can be visualized

tree.ExtraTreeClassifier: High random version of the classification tree

tree.ExtraTreeRegressor

The process of decision tree modeling

The decision tree filters out the final decision tree through classifier integration : Several random features are selected for decision tree classification , After getting multiple decision trees, the fitting effect is the best (score The highest ) A decision tree model of .

The classification of each node depends on entropy and gini Coefficient to divide nodes , After calculating the coefficient, the impurity is obtained , The lower the purity, the more uniform the labels of the classified data in this node . for example , There are apples 、 pears 、 Banana , This node is classified as apple , But there are still a few pears and bananas inside . When the impurity is 0 when , It means that all records have the same label , leaf (leaf) The impurity of the node is 0.

Related codes :

Modeling code

clf = tree.DecisionTreeClassifier(criterion='entropy'[ Default gini]
,random_state=30[ Select features according to certain rules , Features are not randomly selected , Entering the same value will generate the same tree 】
,splitter='random'【 That controls the random selection of features , It is divided into random and best,best When randomly selecting features for the decision tree, more important features will be selected for branching ,random It will be more random when branching , The fitting of selected practice level will be reduced , When over fitting, this method is often used to reduce the fitting 】
,max_depth=3【 The maximum number of layers of the tree ( It doesn't contain root layer ), If it is too large, it will over fit , Too young underfit
,min_samples_leaf=10【 The child nodes of a node contain at least 10 Each training sample , Too big will underfit, Too small will over fit 】
,min_samples_split=10【 The node contains at least 10 Each node 】)

clf=clf.fit(Xtrain,Ytrain)

score=clf.score(Xtest,Ytest)

Visual code

import graphviz

feature_name=['a','b','c']

dot_data=tree.export_graphviz(clf, feature_name=feature_name,class_names=['a','b','v'],filled=True,rounded=True)

graph=graphviz.Source(dot_data)

notes :feature_name Is a feature of a dataset ,class_names Is the label of the record ,filled Is it color or not .

# Decision tree view feature importance

clf.feature_importance_

[zip(feature_name,clf.feature_importances_)]

Parameter selection

import matplotlib.pyplot as plt

test=[]

for i in range(10):
        clf=tree.DecisionTreeClassifier(max_depth=i+1
        ,criterion = "entropy"
        ,random_state=30)

clf=clf.fit(Xtrain,Ytrain)
score=clf.score(Xtest,Ytest)
test.append(score)

plt.plot(range(1,11),test,color='red',label='max_depth')
plt.legend()
plt.show()

Iterate over the parameters , Choose the best parameters , Not only is it max_depth, You can also select other parameters .

Important interfaces

clf.apply(Xtest)# Return the leaf node index of each sample

clf.predict(Xtest)# Return the classification regression results of each sample

原网站

版权声明
本文为[Running white gull]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206211853335313.html