当前位置:网站首页>Chapter IV decision tree summary
Chapter IV decision tree summary
2022-07-24 05:50:00 【CsdN317a】
Catalog
Chapter four Decision tree summary
ID3 Decision tree : Use information gain to divide attributes
C4.5 Decision tree : Use information gain rate to divide attributes
CART Decision tree : Use Gini index to divide attributes
4. Continuous and missing values
Chapter four Decision tree summary
This chapter mainly includes the basic process 、 Divide and choose 、 Pruning 、 Continuous and missing values, etc .

1. The basic flow
What is a decision tree ?
Decision tree is based on the structure of tree to deal with decision , The generation of decision tree is a recursive process , As shown in the example below .

2. Divide and choose
Different decision trees are divided in different ways :
ID3 Decision tree : Use information gain to divide attributes


shortcoming : Tend to choose more attribute categories , But for those with more attribute median classification , Not representative .
C4.5 Decision tree : Use information gain rate to divide attributes

Be careful : Because the gain rate has a preference for attributes with a small number of values , because C4.5 The algorithm is not enough to directly select the candidate partition attribute with the largest gain rate , Instead, a heuristic is used to find the attributes with higher than average information gain from the candidate partition attributes , Then choose the one with the highest gain rate .
CART Decision tree : Use Gini index to divide attributes


3. Pruning
pre-pruning :
From the top down , Compare the accuracy of the verification set before and after partition , To determine whether to divide the current node . If after division , Accuracy not improved , No further division .
advantage : Make some branches of the decision tree not expand , Reduce the risk of over fitting , Save training time .
shortcoming : There is a possibility of under fitting .
After pruning :
Bottom up , First train the whole tree , Then prune upward from the leaf node , Determine whether to prune according to the accuracy of the verification set ( That is, whether to replace the current node with a leaf node ).
advantage : The risk of under fitting is very small , Generalization performance is also very high .
shortcoming : The training time is longer than the pre pruning time .
4. Continuous and missing values
Continuous value processing :
Use dichotomy to deal with continuous attributes , Calculate the information gain of the bisection region respectively .
Missing value processing :
1. When calculating the information gain with missing attributes , First, calculate the information gain of the samples without missing parts , Then multiply the weight ( The proportion of absence in the total );
2. When the sample without this attribute enters the branch , Enter different branches with different weights ( The weight is the proportion of the branch sample to the total sample ).
边栏推荐
- 解决ModularNotFoundError: No module named “cv2.aruco“
- Flink task, sub task, task slot and parallelism
- 达梦数据库_常用初始化参数
- 推荐一款完全开源,功能丰富,界面精美的商城系统
- [activiti] activiti environment configuration
- My little idea -- using MATLAB to realize reading similar to ring buffer
- 【mycat】mycat配置文件
- 学习率余弦退火衰减之后的loss
- The SaaS mall system of likeshop single merchant is built, and the code is open source without encryption.
- 在网络中添加spp模块中的注意点
猜你喜欢

【activiti】activiti介绍

Inventory of well-known source code mall systems at home and abroad

《统计学习方法(第2版)》李航 第17章 潜在语义分析 LSA LSI 思维导图笔记 及 课后习题答案(步骤详细)第十七章

Likeshop single merchant mall system is built, and the code is open source without encryption

《机器学习》(周志华)第2章 模型选择与评估 笔记 学习心得

likeshop单商户商城系统搭建,代码开源无加密

Creation and generation of SVG format map in Heilongjiang Province

程序员常说的API是什么意思?API类型有什么呢?
![Brief introduction of [data mining] cluster analysis](/img/9b/3484cf1353686d38dcf32e845b1903.jpg)
Brief introduction of [data mining] cluster analysis

第三章 线性模型总结
随机推荐
多商户商城系统功能拆解10讲-平台端商品单位
《机器学习》(周志华) 第4章 决策树 学习心得 笔记
Similarities and differences of ODS, data mart and data warehouse
Highcharts use custom vector maps
达梦数据库_逻辑备份
国内外知名源码商城系统盘点
Multi merchant mall system function disassembly lecture 08 - platform end commodity classification
【activiti】activiti环境配置
Xshell远程访问工具
【activiti】网关
《机器学习》(周志华)第2章 模型选择与评估 笔记 学习心得
【activiti】activiti系统表说明
Could not load library cudnn_cnn_infer64_8.dll. Error code 126Please make sure cudnn_cnn_infer64_8.
使用bat命令快速创建系统还原点的方法
【mycat】mycat分库分表
【mycat】mycat搭建读写分离
Multi merchant mall system function disassembly lecture 06 - platform side merchant settlement agreement
How to quickly recover data after MySQL misoperation
Creation and generation of SVG format map in Heilongjiang Province
Multi merchant mall system function disassembly Lecture 10 - platform end commodity units