当前位置：网站首页>Decision tree learning notes

Decision tree learning notes

2022-06-21 20:39:00 【Running white gull】

Types of decision trees

modular sklearn.tree

tree.DecisionTreeClassifier: Classification tree

tree.DecisionTreeRegressor: Back to the tree

tree.export.graphviz: Export the generated decision tree as DOT Format , Special for drawing , The decision tree can be visualized

tree.ExtraTreeClassifier: High random version of the classification tree

tree.ExtraTreeRegressor

The process of decision tree modeling

The decision tree filters out the final decision tree through classifier integration ： Several random features are selected for decision tree classification , After getting multiple decision trees, the fitting effect is the best （score The highest ） A decision tree model of .

The classification of each node depends on entropy and gini Coefficient to divide nodes , After calculating the coefficient, the impurity is obtained , The lower the purity, the more uniform the labels of the classified data in this node . for example , There are apples 、 pears 、 Banana , This node is classified as apple , But there are still a few pears and bananas inside . When the impurity is 0 when , It means that all records have the same label , leaf （leaf) The impurity of the node is 0.

Related codes ：

Modeling code

clf = tree.DecisionTreeClassifier(criterion='entropy'[ Default gini]
,random_state=30[ Select features according to certain rules , Features are not randomly selected , Entering the same value will generate the same tree 】
,splitter='random'【 That controls the random selection of features , It is divided into random and best,best When randomly selecting features for the decision tree, more important features will be selected for branching ,random It will be more random when branching , The fitting of selected practice level will be reduced , When over fitting, this method is often used to reduce the fitting 】
,max_depth=3【 The maximum number of layers of the tree （ It doesn't contain root layer ）, If it is too large, it will over fit , Too young underfit
,min_samples_leaf=10【 The child nodes of a node contain at least 10 Each training sample , Too big will underfit, Too small will over fit 】
,min_samples_split=10【 The node contains at least 10 Each node 】)

clf=clf.fit(Xtrain,Ytrain)

score=clf.score(Xtest,Ytest)