当前位置:网站首页>Classic model – RESNET
Classic model – RESNET
2022-06-26 03:28:00 【On the right is my goddess】
List of articles
Introduction
The advantage of deep convolution neural network is that it has many layers , Each layer can capture different information . From low-level visual features to high-level semantic features .
But is it a good thing to have so many layers ?
Obviously not , With the deepening of network level , There will be gradient explosion and gradient disappearance .
The common solution is to initialize or join well BN layer .
However , Although after these operations , The model converges , But the accuracy has decreased . This is not caused by over fitting , Because the training error and the test error have increased . As shown in the figure below .

Further reflection : In theory , If my shallow network performance is better , The performance of the deep network should not decline . Because it can at least make the new layer a identity mapping( Identity mapping ). however , ordinary SGD It can't be done .
therefore , The article puts forward deep residual learning framework, Ensure that the network performance will not deteriorate with the increase of depth , This is equivalent to explicitly constructing a identity mapping.

The core idea : Suppose the output of the model is H ( x ) H(x) H(x), But I won't let the model learn directly H ( x ) H(x) H(x), But to learn H ( x ) − x H(x)-x H(x)−x, I we remember as F ( x ) F(x) F(x). The final output is F ( x ) + x F(x)+x F(x)+x. We will F ( x ) F(x) F(x) It's called residual error . Intuitively speaking , Just don't learn how to get H(x), But to learn the residual between what has been learned and what is real .
Advantage lies in : Model complexity will not increase ; The amount of calculation will not increase .
It is proved by experiments that :plain The version is less effective ( No, residual/shortcut connection); As the depth increases , Performance will also improve .
Deep Residual Learning


The above figure shows four versions of ResNet Structure . You can see 50layer And above and 18、34 The structure of the version of is different , This is because as the number of network layers deepens , We hope to increase the number of channels , Because depth means you can learn more , But considering the increase of parameter quantity , So construct a bottleneck Structure , adopt 1x1 Convolution realizes channel compression and recovery .
In the model BN layer 、 Data augmentation to improve the generalization performance of the model , But it doesn't work dropout, Because it does not include the full connection layer .
So how do residual connections handle different input and output shapes ?
The first solution is to add some extra... To the input and output respectively 0, Make the shapes of the two correspond ;
The second scheme is to use 1x1 Convolution for projection .
Experiments

This picture shows 18 and 34 The version is available residual connection The difference between . It means that :
- The initial training error is larger than the test error , This is the result of data enhancement ;
- Every sudden drop is due to a drop in the learning rate . At present, multiplication is generally not used 0.1 The means of , Because the timing is hard to grasp , Multiplying too early will lead to weak convergence in the later stage ;
- This experiment shows that the convergence speed is faster after the residual connection 、 The performance is also relatively better ;
Why? ResNet Fast training ?
The reason why the gradient disappears is that as the network deepens , The chain rule multiplies many very small numbers , So that the gradient descent method subtracts a value close to 0 Value , Of course , If you fall into a local optimal position , No deep network , The gradient will easily disappear ;
however ResNet Words , The advantage is that a gradient of shallow network is added to the original foundation , This deep gradient is smaller , But the shallow ones are still relatively large , So mathematically , Gradients don't disappear easily .
The so-called reduced model complexity does not mean that it can not represent other things , Instead, we can find a less complex model to fit the data , As the author said , When connecting without residuals , Theoretically, we can learn one by one identity Things that are ( Don't leave anything behind ), But it can't be done , Because if you don't guide the whole network to go like this , In fact, in theory, it simply can not pass , So you have to add this result manually , It makes it easier to train a simple model to fit the data , It is equivalent to reducing the complexity of the model .( An excerpt from here )
边栏推荐
- Analysis of the multiple evaluation system of children's programming
- Classic quotations from "human nature you must not know"
- kotlin快速上手
- Preparation for wechat applet development
- QT compilation error: unknown module (s) in qt: script
- Learn Tai Chi Maker - mqtt (IV) server connection operation
- Leetcode 176 The second highest salary (June 25, 2022)
- Stm32cubemx: watchdog ------ independent watchdog and window watchdog
- MySQL数据库基础
- Do you want to add a key to the applet or for sequence?
猜你喜欢

双碳红利+基建大年 | 图扑深耕水利水电绿色智能装备领域

用元分析法驱动教育机器人的发展

Xgboost, lightgbm, catboost -- try to stand on the shoulders of giants

Review of the paper: unmixing based soft color segmentation for image manipulation

网络PXE启动WinPE,支持UEFI和LEGACY引导

Lumen Analysis and Optimization of ue5 global Lighting System
![[hash table] improved, zipper hash structure - directly use two indexes to search, instead of hashing and% every time](/img/e3/1bedf03493283da327fef9ecc54542.jpg)
[hash table] improved, zipper hash structure - directly use two indexes to search, instead of hashing and% every time

MySQL开发环境

【论文笔记】Manufacturing Control in Job Shop Environments with Reinforcement Learning

如何筹备一场感人的婚礼
随机推荐
gstreamer分配器与pool的那些事
Drag and drop
请求对象,发送请求
Please advise tonghuashun which securities firm to choose for opening an account? Is it safe to open an account online?
给网站添加“开放搜索描述“以适配浏览器的“站点搜索“
《你不可不知的人性》經典語錄
Learn from Taiji makers - mqtt (V) publish, subscribe and unsubscribe
【QT】自定义控件-开关
scrapy返回400
progress bar
MySQL开发环境
Where is it safe to open a fund account?
The golang regular regexp package uses -06- other usages (special character conversion, finding the regular common prefix, switching greedy mode, querying the number of regular groups, querying the na
P2483-[template]k short circuit /[sdoi2010] Magic pig college [chairman tree, pile]
[QT] custom control - switch
Review of the paper: unmixing based soft color segmentation for image manipulation
Oracle exercise
Network PXE starts winpe and supports UEFI and legacy boot
国信金太阳靠谱吗?开证券账户安全吗?
How to prompt