当前位置:网站首页>[paper notes] effective CNN architecture design guided by visualization
[paper notes] effective CNN architecture design guided by visualization
2022-07-25 07:34:00 【m0_ sixty-one million eight hundred and ninety-nine thousand on】
The paper

Preface
Modern efficient Convolutional Neural Networks (
CNN) Always usedDepth separates the convolution(DSC) andNeural network architecture search(NAS) To reduce the number of parameters and computational complexity . But it ignores some inherent characteristics of the network . Subject to visual feature map and N×N(N>1) The inspiration of convolution kernel , This article introduces several guidelines , To further improve parameter efficiency and reasoning speed .The parameters designed based on these guidelines are efficient CNN The architecture is called
VGNetG, Better accuracy and lower latency than previous Networks , The parameters are reduced by about 30%~50%.VGNetG-1.0MPstayImageNetClassify the data set with0.99MThe parameters of are implemented67.7%Oftop-1Accuracy rate , stay1.14MUnder the parameters of69.2%Oftop-1Accuracy rate .Besides , prove
Edge detectorIt can be fixed byedge detection kernelReplaceN×N kernelTo replace the learnable deep convolution layer to mix features .VGNetF-1.5MPachieve64.4%(-3.2%) Oftop-1Accuracy and accuracy66.2%(-1.4%) Oftop-1Accuracy rate , With additionalgaussian kernel.
1 Methods of this paper
The author mainly studies the..., which is constructed by standard convolution 3 A typical network :
Standard convolution ==>
ResNet-RS,Group convolution ==>
RegNet,Depth separates the convolution ==>
MobileNet、ShuffleNetV2andEfficientNets.
These visualization results show that ,M×N×N kernel There are obviously different patterns and distributions in different stages of the network .
1.1 CNN You can learn how to satisfy the sampling theorem
Previous work has always believed that convolutional neural networks ignore the classical sampling theorem , But the author found that the convolutional neural network can meet the sampling theorem to a certain extent by learning the low-pass filter , Especially based on DSCs Network of , for example MobileNetV1 and EfficientNets, Pictured 2 Shown .

1、 Standard convolution / Group convolution
Pictured 2a and 2b Shown , Throughout M×N×N individual kernel There are one or more significant N×N individual kernel, For example, blur kernel, This phenomenon also means that the parameters of these layers are redundant . Please note that , remarkable kernel It doesn't necessarily look like Gauss kernel.
2、 Depth separates the convolution
Strided-DSC Of kernel Usually similar to Gauss kernel, Including but not limited to MobileNetV1、MobileNetV2、MobileNetV3、ShuffleNetV2、ReXNet、EfficientNets. Besides ,Strided-DSC kernel The distribution of is not Gaussian , It's a Gaussian mixture .
3、 The last convolution layer Kernels
modern CNN Always use the global pooling layer before the classifier to reduce the dimension . therefore , A similar phenomenon also appears on the final depth convolution layer , Pictured 4 Shown .

These visualizations indicate that depth convolution should be selected in the lower sampling layer and the last layer instead of standard convolution and group convolution . Besides , Fixed Gauss can be used in the lower sampling layer kernel.
1.2 Reuse feature maps between adjacent layers
Identity Kernel And similar characteristic graphs

As shown in the figure above , Many deep convolution kernels have large values only in the center , Just like the identity core in the middle of the network . Because the input is only passed to the next layer , Therefore, convolution with identity kernel will lead to repetition of characteristic graph and computational redundancy . On the other hand , The following figure shows that many feature maps are similar between adjacent layers ( Repetitive ).

therefore , Partial convolution can be replaced by identity mapping . otherwise , Depth convolution is slow in early layers , Because they are usually not fully utilized Shufflenet V2 Modern accelerators reported in . So this method can improve parameter efficiency and reasoning time .
1.3 Edge detector as a learnable depth convolution
Edge features contain important information about the image . As shown in the figure below , Most of the kernel Similar to edge detection kernel, for example Sobel filter kernel and Laplace filter kernel. And this kernel The proportion of is reduced in the later layer , And like ambiguity kernel Of kernel The proportion increases .

therefore , Maybe Edge detector Can replace based on DSC Deep convolution in the network of , To mix features between different spatial locations . The author will use edge detection kernel Replacement can be learned kernel To prove that .
2 Network architecture
2.1 DownsamplingBlock
DownsamplingBlock Halve the resolution and expand the number of channels . Pictured a Shown , Only extended channels are generated by pointwise convolution to reuse features . The kernel of deep convolution can be initialized randomly or use a fixed Gaussian kernel .

2.2 HalfIdentityBlock
Pictured b Shown , Replace half depth convolution with identity mapping , And reduce while maintaining the block width half pointwise convolutions.

Please note that , The right half of the input channel becomes the left half of the output channel , In order to better reuse features .
2.3 VGNet Architecture
Use DownsamplingBlock and HalfIdentityBlock Constructed a VGNets. whole VGNetG-1.0MP The architecture is shown in the table 1 Shown .

2.4 Variants of VGNet
For further study N×N The impact of the kernel , Introduced VGNets Several variants of :VGNetC、VGNetG and VGNetF.
VGNetC: All parameters are randomly initialized and learnable .
VGNetG: except DownsamplingBlock Outside the kernel of , All parameters are randomly initialized and learnable .
VGNetF: All parameters of depth convolution are fixed .

3 experiment


边栏推荐
- 【程序员2公务员】四、常见问题
- QT学习日记20——飞机大战项目
- 9大最佳工程施工项目管理系统
- List derivation
- Problems in deep learning training and testing: error: the following arguments are required: --dataroot, solution: the configuration method of training files and test files
- Summary of learning notes of deep learning application development (II)
- How does uxdb extract hours, minutes and seconds from date values?
- 3. Promise
- leetcode刷题:动态规划06(整数拆分)
- Line generation (matrix ')
猜你喜欢

Paper reading: UNET 3+: a full-scale connected UNET for medical image segmentation

J1 常用的DOS命令(P25)

Introduction to cesium
![[dynamic programming] - Knapsack model](/img/0d/c467e70457495f130ec217660cbea7.png)
[dynamic programming] - Knapsack model

First, how about qifujin

新库上线| CnOpenDataA股上市公司股东信息数据

北京内推 | 微软STCA招聘NLP/IR/DL方向研究型实习生(可远程)

NLP hotspots from ACL 2022 onsite experience

js无法获取headers中Content-Disposition

Before Oracle 19C migration, how important is it to do a good job of rat playback test?
随机推荐
Beijing internal promotion | Microsoft STCA recruits nlp/ir/dl research interns (remote)
SAP queries open Po (open purchase order)
【微信小程序】全局样式、局部样式、全局配置
What has become a difficult problem for most people to change careers, so why do many people study software testing?
QT学习日记20——飞机大战项目
阿里云镜像地址&网易云镜像
曼哈顿距离简介
Paddlepaddle 34 adjust the layer structure and forward process of the model (realize the addition, deletion, modification and forward modification of the layer)
JS cannot get content disposition in headers
【程序员2公务员】四、常见问题
About --skip networking in gbase 8A
[dynamic programming] - Knapsack model
华为无线设备配置WAPI-证书安全策略
A domestic open source redis visualization tool that is super easy to use, with a high-value UI, which is really fragrant!!
BOM概述
Polling, interrupt, DMA and channel
Incremental crawler in distributed crawler
"Game illustrated book": a memoir dedicated to game players
[ES6] function parameters, symbol data types, iterators and generators
Huawei wireless device sta black and white list configuration command