当前位置:网站首页>Classic model – RESNET
Classic model – RESNET
2022-06-26 03:28:00 【On the right is my goddess】
List of articles
Introduction
The advantage of deep convolution neural network is that it has many layers , Each layer can capture different information . From low-level visual features to high-level semantic features .
But is it a good thing to have so many layers ?
Obviously not , With the deepening of network level , There will be gradient explosion and gradient disappearance .
The common solution is to initialize or join well BN layer .
However , Although after these operations , The model converges , But the accuracy has decreased . This is not caused by over fitting , Because the training error and the test error have increased . As shown in the figure below .

Further reflection : In theory , If my shallow network performance is better , The performance of the deep network should not decline . Because it can at least make the new layer a identity mapping( Identity mapping ). however , ordinary SGD It can't be done .
therefore , The article puts forward deep residual learning framework, Ensure that the network performance will not deteriorate with the increase of depth , This is equivalent to explicitly constructing a identity mapping.

The core idea : Suppose the output of the model is H ( x ) H(x) H(x), But I won't let the model learn directly H ( x ) H(x) H(x), But to learn H ( x ) − x H(x)-x H(x)−x, I we remember as F ( x ) F(x) F(x). The final output is F ( x ) + x F(x)+x F(x)+x. We will F ( x ) F(x) F(x) It's called residual error . Intuitively speaking , Just don't learn how to get H(x), But to learn the residual between what has been learned and what is real .
Advantage lies in : Model complexity will not increase ; The amount of calculation will not increase .
It is proved by experiments that :plain The version is less effective ( No, residual/shortcut connection); As the depth increases , Performance will also improve .
Deep Residual Learning


The above figure shows four versions of ResNet Structure . You can see 50layer And above and 18、34 The structure of the version of is different , This is because as the number of network layers deepens , We hope to increase the number of channels , Because depth means you can learn more , But considering the increase of parameter quantity , So construct a bottleneck Structure , adopt 1x1 Convolution realizes channel compression and recovery .
In the model BN layer 、 Data augmentation to improve the generalization performance of the model , But it doesn't work dropout, Because it does not include the full connection layer .
So how do residual connections handle different input and output shapes ?
The first solution is to add some extra... To the input and output respectively 0, Make the shapes of the two correspond ;
The second scheme is to use 1x1 Convolution for projection .
Experiments

This picture shows 18 and 34 The version is available residual connection The difference between . It means that :
- The initial training error is larger than the test error , This is the result of data enhancement ;
- Every sudden drop is due to a drop in the learning rate . At present, multiplication is generally not used 0.1 The means of , Because the timing is hard to grasp , Multiplying too early will lead to weak convergence in the later stage ;
- This experiment shows that the convergence speed is faster after the residual connection 、 The performance is also relatively better ;
Why? ResNet Fast training ?
The reason why the gradient disappears is that as the network deepens , The chain rule multiplies many very small numbers , So that the gradient descent method subtracts a value close to 0 Value , Of course , If you fall into a local optimal position , No deep network , The gradient will easily disappear ;
however ResNet Words , The advantage is that a gradient of shallow network is added to the original foundation , This deep gradient is smaller , But the shallow ones are still relatively large , So mathematically , Gradients don't disappear easily .
The so-called reduced model complexity does not mean that it can not represent other things , Instead, we can find a less complex model to fit the data , As the author said , When connecting without residuals , Theoretically, we can learn one by one identity Things that are ( Don't leave anything behind ), But it can't be done , Because if you don't guide the whole network to go like this , In fact, in theory, it simply can not pass , So you have to add this result manually , It makes it easier to train a simple model to fit the data , It is equivalent to reducing the complexity of the model .( An excerpt from here )
边栏推荐
猜你喜欢

给网站添加“开放搜索描述“以适配浏览器的“站点搜索“

用元分析法驱动教育机器人的发展

MySQL development environment

What can Arthas do for you?

Cloud Computing Foundation -0

ArrayList # sublist these four holes, you get caught accidentally

Partition, column, list

Stm32cubemx: watchdog ------ independent watchdog and window watchdog

Utonmos adheres to the principle of "collection and copyright" to help the high-quality development of traditional culture

Using meta analysis to drive the development of educational robot
随机推荐
Is it safe to open an account in flush online? How to open a brokerage account online
Distributed e-commerce project grain mall learning notes < 3 >
Review of the paper: unmixing based soft color segmentation for image manipulation
Network PXE starts winpe and supports UEFI and legacy boot
开通基金账户是安全的吗?怎么申请呢
The role of children's programming in promoting traditional disciplines in China
计组笔记 数据表示与运算 校验码部分
[reading papers] fbnetv3: joint architecture recipe search using predictor training network structure and super parameters are all trained by training parameters
Une citation classique de la nature humaine que vous ne pouvez pas ignorer
The golang regular regexp package uses -06- other usages (special character conversion, finding the regular common prefix, switching greedy mode, querying the number of regular groups, querying the na
小程序或者for循序要不要加key?
js array数组json去重
拖放
双碳红利+基建大年 | 图扑深耕水利水电绿色智能装备领域
Class diagram
HL7Exception: Can‘t XML-encode a GenericMessage. Message must have a recognized struct
kitti2bag 安装出现的各种错误
[QT] custom control - switch
Preparation for wechat applet development
Analysis of the multiple evaluation system of children's programming