当前位置：网站首页>[paper notes] effective CNN architecture design guided by visualization

[paper notes] effective CNN architecture design guided by visualization

2022-07-25 07:34:00 【m0_ sixty-one million eight hundred and ninety-nine thousand on】

The paper

Preface

Modern efficient Convolutional Neural Networks (CNN) Always used Depth separates the convolution (DSC) and Neural network architecture search (NAS) To reduce the number of parameters and computational complexity . But it ignores some inherent characteristics of the network . Subject to visual feature map and N×N(N>1) The inspiration of convolution kernel , This article introduces several guidelines , To further improve parameter efficiency and reasoning speed .
The parameters designed based on these guidelines are efficient CNN The architecture is called VGNetG, Better accuracy and lower latency than previous Networks , The parameters are reduced by about 30%~50%.VGNetG-1.0MP stay ImageNet Classify the data set with 0.99M The parameters of are implemented 67.7% Of top-1 Accuracy rate , stay 1.14M Under the parameters of 69.2% Of top-1 Accuracy rate .
Besides , prove Edge detector It can be fixed by edge detection kernel Replace N×N kernel To replace the learnable deep convolution layer to mix features .VGNetF-1.5MP achieve 64.4%(-3.2%) Of top-1 Accuracy and accuracy 66.2%(-1.4%) Of top-1 Accuracy rate , With additional gaussian kernel.

1 Methods of this paper

The author mainly studies the..., which is constructed by standard convolution 3 A typical network ：

Standard convolution ==>ResNet-RS,
Group convolution ==>RegNet,
Depth separates the convolution ==>MobileNet、ShuffleNetV2 and EfficientNets.

These visualization results show that ,M×N×N kernel There are obviously different patterns and distributions in different stages of the network .

1.1 CNN You can learn how to satisfy the sampling theorem

Previous work has always believed that convolutional neural networks ignore the classical sampling theorem , But the author found that the convolutional neural network can meet the sampling theorem to a certain extent by learning the low-pass filter , Especially based on DSCs Network of , for example MobileNetV1 and EfficientNets, Pictured 2 Shown .

1、 Standard convolution / Group convolution

Pictured 2a and 2b Shown , Throughout M×N×N individual kernel There are one or more significant N×N individual kernel, For example, blur kernel, This phenomenon also means that the parameters of these layers are redundant . Please note that , remarkable kernel It doesn't necessarily look like Gauss kernel.

2、 Depth separates the convolution

Strided-DSC Of kernel Usually similar to Gauss kernel, Including but not limited to MobileNetV1、MobileNetV2、MobileNetV3、ShuffleNetV2、ReXNet、EfficientNets. Besides ,Strided-DSC kernel The distribution of is not Gaussian , It's a Gaussian mixture .

3、 The last convolution layer Kernels

modern CNN Always use the global pooling layer before the classifier to reduce the dimension . therefore , A similar phenomenon also appears on the final depth convolution layer , Pictured 4 Shown .

These visualizations indicate that depth convolution should be selected in the lower sampling layer and the last layer instead of standard convolution and group convolution . Besides , Fixed Gauss can be used in the lower sampling layer kernel.

1.2 Reuse feature maps between adjacent layers

Identity Kernel And similar characteristic graphs

As shown in the figure above , Many deep convolution kernels have large values only in the center , Just like the identity core in the middle of the network . Because the input is only passed to the next layer , Therefore, convolution with identity kernel will lead to repetition of characteristic graph and computational redundancy . On the other hand , The following figure shows that many feature maps are similar between adjacent layers （ Repetitive ）.

therefore , Partial convolution can be replaced by identity mapping . otherwise , Depth convolution is slow in early layers , Because they are usually not fully utilized Shufflenet V2 Modern accelerators reported in . So this method can improve parameter efficiency and reasoning time .

1.3 Edge detector as a learnable depth convolution

Edge features contain important information about the image . As shown in the figure below , Most of the kernel Similar to edge detection kernel, for example Sobel filter kernel and Laplace filter kernel. And this kernel The proportion of is reduced in the later layer , And like ambiguity kernel Of kernel The proportion increases .

therefore , Maybe Edge detector Can replace based on DSC Deep convolution in the network of , To mix features between different spatial locations . The author will use edge detection kernel Replacement can be learned kernel To prove that .

2 Network architecture

2.1 DownsamplingBlock

DownsamplingBlock Halve the resolution and expand the number of channels . Pictured a Shown , Only extended channels are generated by pointwise convolution to reuse features . The kernel of deep convolution can be initialized randomly or use a fixed Gaussian kernel .

2.2 HalfIdentityBlock

Pictured b Shown , Replace half depth convolution with identity mapping , And reduce while maintaining the block width half pointwise convolutions.

Please note that , The right half of the input channel becomes the left half of the output channel , In order to better reuse features .

2.3 VGNet Architecture

Use DownsamplingBlock and HalfIdentityBlock Constructed a VGNets. whole VGNetG-1.0MP The architecture is shown in the table 1 Shown .

2.4 Variants of VGNet

For further study N×N The impact of the kernel , Introduced VGNets Several variants of ：VGNetC、VGNetG and VGNetF.

VGNetC： All parameters are randomly initialized and learnable .

VGNetG： except DownsamplingBlock Outside the kernel of , All parameters are randomly initialized and learnable .

VGNetF： All parameters of depth convolution are fixed .

3 experiment

Reprint ： Lightweight Backbone overlord | VGNetG achievement “ No choice , All of them ” Lightweight backbone ！

原网站

版权声明
本文为[m0_ sixty-one million eight hundred and ninety-nine thousand on]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207250726087958.html