当前位置:网站首页>Decision tree and random forest
Decision tree and random forest
2022-06-26 17:51:00 【lmn_】

0x01 Decision tree Overview
Decision tree is a model for classification and regression , It is a supervised machine learning algorithm , It can be used for classification and regression problems . The tree answers successive questions , These questions enable us to follow a certain route of the tree with the answers given .
When building the decision tree , We know which variable and which value the variable uses to split the data , So as to quickly predict the results .

Advantages of decision tree
- Easy to interpret and visualize
- Internal operations can be observed , This makes replication possible
- Can quickly adapt to data sets
- Can handle numerical and categorical data
- have access to “ Trees ” Figure view and explain the final model in an orderly manner
- Good performance on large datasets
- Extremely fast
Disadvantages of decision tree
- Building a decision tree requires an algorithm that can determine the best choice for each node
- Decision trees are prone to over fitting , Especially when the tree is very deep
0x02 Random forest Overview
Forests have almost the same hyperparameters as decision trees , Generally speaking , A tree cannot get effective and desired results , At this time, we need to use the concept of random forest , Random forest is a kind of forest used for classification 、 Integrated learning methods for regression and other tasks .

Random forest can be understood as a group of decision trees , It is the aggregation of many decisions into one result , By constructing a large number of decision trees during training , Is a tree based machine learning algorithm , It uses the power of multiple decision trees to make decisions .
When building the random forest algorithm model , We have to define how many trees to make and how many variables are required for each node .
1995 year , Tin Kam Ho The first random decision forest algorithm is created by using the random subspace method , stay Ho In the formula of , This is a method to realize random discrimination ” Method of classification .
Methods of random forest variance reduction :
- Train different data samples
- Use random feature subsets

Random forest advantages
- Random decision forest corrects the over fitting of decision tree
- Random forests are usually better than decision trees , But they are less accurate than gradient lifting trees
- More trees will improve performance and make predictions more stable
Random forest disadvantages
- The random forest algorithm model is more complex , Because it is a combination of decision trees
- More trees will slow down the computation
0x03 The difference between decision tree and random forest
The key difference between random forest algorithm and decision tree is , A decision tree is a graph that uses a branching method to illustrate all possible outcomes of a decision . by comparison , The output of the random forest algorithm is a set of decision trees that work according to the output .
Decision tree is relative to decision forest , The model is better built , For random forests , The visualization of the final model is poor , If the amount of data is too large or there is no appropriate processing method to process the data , It will take a long time to create .
There is always a space of over fitting in the decision tree ; Random forest algorithm avoids and prevents over fitting by using multiple trees .
Decision trees require low computation , Thus, the implementation time is reduced and the precision is low ; Random forests consume more computation . The process of generation and analysis is very time-consuming .
Decision trees can be easily visualized ; Random forest visualization is complex .
0x04 Build
Pruning is further chopping these branches . It serves as a classification to subsidize data in a better way . Just as we say the way to trim the excess parts , It works on the same principle .
Reach leaf node , Trim end . It is a very important part of the decision tree .
0x05 summary
Compared to random forests , The decision tree is very easy . The decision tree combines some decisions , The random forest combines several decision trees .
Decision trees are fast and easy to operate on large datasets . Stochastic forest models require rigorous training , A lot of random forests , More time .

边栏推荐
- 【uniapp】uniapp手机端使用uni.navigateBack失效问题解决
- [dynamic planning] Jianzhi offer II 091 Paint the house
- Preparing for the Blue Bridge Cup and ccf-csp
- ACL 2022 | zero sample multilingual extracted text summarization based on neural label search
- Viteconfigure project path alias
- 贝叶斯网络详解
- 物联网协议的王者:MQTT
- 无需人工先验!港大&同济&LunarAI&旷视提出基于语义分组的自监督视觉表征学习,显著提升目标检测、实例分割和语义分割任务!
- 【代码随想录-动态规划】T583、两个字符串的删除操作
- 有依赖的背包问题
猜你喜欢

Various types of gypsum PBR multi-channel mapping materials, please collect them quickly!

背包问题求方案数

【uniapp】uniapp手机端使用uni.navigateBack失效问题解决

Distributed Architecture Overview

wechat_微信小程序中解决navigator进行页面跳转并传递参数问题

Jouer avec Linux et installer et configurer MySQL facilement

玩转Linux,轻松安装配置MySQL

ACL 2022 | 基于神经标签搜索的零样本多语言抽取式文本摘要

Army chat -- registration of Registration Center

Tsinghua & Shangtang & Shanghai AI & CUHK proposed Siamese image modeling, which has both linear probing and intensive prediction performance!
随机推荐
Lm06 the mystery of constructing the bottom and top trading strategy only by trading volume
MySql 导出数据库中的全部表索引
Strength and appearance Coexist -- an exclusive interview with Liu Yu, a member of Apache pulsar PMC
17.13 supplementary knowledge, thread pool discussion, quantity discussion and summary
Technical scheme design of chain game system development - NFT chain game system development process and source code
17.13 补充知识、线程池浅谈、数量谈、总结
MySQL add column failed because there was data before, not null by default
RSA概念详解及工具推荐大全 - lmn
腾讯钱智明:信息流业务中的预训练方法探索与应用实践
深层次安全定义剖析及加密技术
Vue--vuerouter cache routing component
RuntimeError: CUDA error: out of memory自己的解决方法(情况比较特殊估计对大部分人不适用)
js强制转换
Here comes the hero League full skin Downloader
Today, I met a "migrant worker" who took out 38K from Tencent, which let me see the ceiling of the foundation
无需人工先验!港大&同济&LunarAI&旷视提出基于语义分组的自监督视觉表征学习,显著提升目标检测、实例分割和语义分割任务!
Microservice architecture practice: business management background and SSO design, SSO client design
VSCode使用 - Remote-SSH 配置说明
LM06丨仅用成交量构造抄底摸顶策略的奥秘
【QNX】命令