当前位置:网站首页>Cross validation -- a story that cannot be explained clearly
Cross validation -- a story that cannot be explained clearly
2022-06-26 01:37:00 【Green Lantern swordsman】
“ You think you understand , But others say every minute that you are speechless ; You think you don't understand , In fact, you don't know whether you understand .”
One 、 Read a text AI Data sets : Training set 、 Verification set 、 Test set
The summary of the role of the validation set here is correct , But I think the explanation of cross validation is wrong , So don't look at the content of cross validation here
Two 、 Types of cross validation and its advantages and disadvantages
First , Make a point :
When training , The training data and the validation data set of the data should have the same data distribution under the ideal situation .
What are the types of cross validation ? This is a very easy problem to make mistakes . Actually , It's not just common K Fold cross validation belongs to this category , Other categories also fall into this category . Link one This part of the content is well organized
(1) Set aside method
In the machine learning task , After getting the data , We will first divide the original data set into three parts : Training set 、 Validation set and test set .
The training set is used to train the model , The validation set is used for the parameter selection configuration of the model , The test set is unknown to the model , Used to assess the generalization capability of the model .
The disadvantage is that : Do only one split , When the original data set is small , Whether the distribution of the segmented data is the same as that of the original data set is sensitive , Different partitions lead to different optimal models , And divided into three sets , There is less data for training .
(2)K Crossover verification , Its content comes from Link one and Link three
Suppose there is n An observation , We divide them into K Group . Use it K-1 Group to train the model , Then use the trained model to predict the remaining group , And calculate the prediction error on this group . Because from K Choose... From the group K-1 Group has K A choice , It can also be understood as this K Each group may become the remaining prediction group . So it will be calculated K Second prediction error , For this K A cross validation error is obtained by averaging the prediction errors . The above process is called K-fold Cross validation .
Actually ,“ Cross validation ” Medium “ cross ”2 The word means : Fixed data in different K The position of compromise is different , It could be a validation set , It could also be a training set .
advantage :k Fold cross verify pass pair k The results of training in different groups were averaged to reduce the variance , As a result, the performance of the model is less sensitive to the partitioning of data .
K The ultimate in folding cross validation is “ Keep one ”
(3) Boot lifting method
shortcoming : The data distribution of the training set thus generated is different from that of the original data set , It introduces estimation bias .
This method is not often used , Unless the data is really small .
3、 ... and 、 The significance of cross validation
1. Function I summary : originate Link one
It is mostly used in occasions with little data , Prevent over fitting . Generally, it is not used for in-depth learning and running standard data sets .
(1) Feed all the data into the model , The model can get as much effective information as possible from the limited data .
(2) In training , Every piece of data has the opportunity to make a verification set , Inconsistent data distribution can be found according to the performance of the validation set , Better examine the performance of the model on various new data , Prevent over fitting .
2. Function 2 Summary : originate Link one and Link three
effect : Find the right model parameters
During model selection , Suppose the model sequence has a tuning parameter, Different tuning parameter Then a model is determined , Calculate the cross validation error , Finally, select the one that minimizes the cross validation error tuning parameter. This is the model selection process .
边栏推荐
- 25. histogram comparison
- Dgus new upgrade: fully support digital video playback function
- 开窍之问答
- ETCD数据库源码分析——集群间网络层服务端接口
- Shengxin weekly issue 34
- --SQL of urban cultivation manual -- Chapter 1 basic review
- Enlightenment Q & A
- Web information collection, naked runners on the Internet
- 使用Gin框架运行Demo时报错“ listen tcp :8080: bind: An attempt was made to access a socket in a way forbidden”
- Install tensorflow GPU miscellaneous
猜你喜欢

图文大师印章简易制作

Technical foreword - metauniverse

Dgus new upgrade: fully support digital video playback function

100ask seven day IOT training camp learning notes - bare metal program framework design

Remote incremental synchronization artifact Rsync

使用Gin框架运行Demo时报错“ listen tcp :8080: bind: An attempt was made to access a socket in a way forbidden”

Oracle database startup backup preparation

28. contour discovery

数组中的第K个最大元素

DGUS新升级:全面支持数字视频播放功能
随机推荐
新库上线 | CnOpenDataA股上市公司IPO申报发行文本数据
通过电脑获取WIFI密码(只能连接过的WiFi)
《网络是怎么样连接的》读书笔记 - 集线器、路由器和路由器(三)
Etcd database source code analysis -- inter cluster network layer server interface
Install tensorflow GPU miscellaneous
Technical foreword - metauniverse
CityJSON
Tools - API document generation tool
Have you considered going or staying in graduation season
Discrete Mathematics - 01 mathematical logic
数组中的第K个最大元素
Native DOM vs. virtual DOM
Comment promouvoir efficacement les produits
MySQL例题一 综合案例(多条件组合查询)
Musk vs. jobs, who is the greatest entrepreneur in the 21st century
接口的幂等性——详细谈谈接口的幂等即解决方案
Oracle常用的基础命令
JSON实例(一)
Radio boxes are mutually exclusive and can be deselected at the same time
超详细SSM框架实现增删改查功能项目整体流程