当前位置:网站首页>In depth analysis of mobilenet and its variants
In depth analysis of mobilenet and its variants
2022-06-23 13:39:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery I've been looking at things about lightweight networks recently , I found that this summary is very good , So I translated it ! Sum up the varieties , At the same time, the schematic diagram is very clear , Hope to give you some inspiration , If you feel good, welcome to sanlianha !
Introduction
In this paper , I outlined efficiency CNN Model ( Such as MobileNet And its variants ) The components used in (building blocks), And explain why they are so efficient . Specially , I've provided a visual explanation of how to convolute in space and channel domains .
Components used in efficient models
In explaining specific efficiency CNN Before the model , Let's check the efficiency first CNN The amount of computation of the components used in the model , Let's see how convolution works in space and channel domains .

hypothesis H x W For export feature map Space size of ,N Is the number of input channels ,K x K Is the size of the convolution kernel ,M Is the number of output channels , Then the computation of standard convolution is HWNK²M .
The important point here is , The computation of standard convolution is the same as that of (1) Output characteristic map H x W Space size of ,(2) Convolution kernel K Size ,(3) Number of input and output channels N x M In direct proportion to .
When convoluting in the spatial domain and the channel domain , The above calculation is required . By decomposing this convolution , Can speed up CNNs, As shown in the figure below .
Convolution
First , I offer an intuitive explanation , How the convolution of space and channel domain is standard convolution , The amount of calculation is HWNK²M .
I connect the line between the input and the output , To visualize the dependency between input and output . The number of lines roughly represents the amount of computation of convolution in space and channel domain .

for example , The most common convolution ——conv3x3, It can be as shown in the figure above . We can see , Input and output are locally connected in the spatial domain , And in the channel domain, it's fully connected .

Next , Used to change the number of channels as shown above conv1x1, or pointwise convolution. because kernel Its size is 1x1, So the computational cost of this convolution is HWNM, Calculation volume ratio conv3x3 To reduce the 1/9. This convolution is used to “ blend ” Information between channels .
Grouping convolution (Grouped Convolution)
Packet convolution is a variant of convolution , Will input feature map Channel grouping of , Convolute the channels of each packet independently .
hypothesis G Represents the number of groups , The computational complexity of grouping convolution is HWNK²M/G, The computation becomes standard convolution 1/G.

stay conv3x3 and G=2 The situation of . We can see , The number of connections in the channel domain is smaller than the standard convolution , It shows that the calculation is less .

stay conv3x3,G=3 Under the circumstances , Connections become more sparse .

stay conv1x1,G=2 Under the circumstances ,conv1x1 It can also be grouped . This type of convolution is used to ShuffleNet in .

stay conv1x1,G=3 The situation of .
Depth separates the convolution (Depthwise Convolution)
In depth convolution , Convolute each input channel separately . It can also be defined as a special case of packet convolution , Where the number of input and output channels is the same ,G Equal to the number of channels .

As shown above ,depthwise convolution By omitting convolution in the channel domain , The computation is greatly reduced .
Channel Shuffle
Channel shuffle It's an operation ( layer ), It changes ShuffleNet The order of channels used in . This is done by tensor reshape and transpose To achieve .
More precisely , Give Way GN ' (=N) Represents the number of input channels , First, the dimension of the input channel reshape by (G, N '), And then (G, N ') Transpose to (N ', G), Finally, it will be flatten In the same shape as the input . here G Denotes the number of groups in the packet convolution , stay ShuffleNet China and channel shuffle Use layers together .
although ShuffleNet Can't use multiply add operation (MACs) To define by the number of , But there should be some overhead .

G=2 At the time of the channel shuffle situation . Convolution does not perform , It just changed the order of the channels .

The number of channels disturbed in this case G=3
Efficient Models
below , For efficient CNN Model , I'm going to intuitively explain why they're efficient , And how to convolute in space and channel domain .
ResNet (Bottleneck Version)
ResNet With the use of bottleneck The residual unit of the architecture is a good starting point for further comparison with other models .

As shown above , have bottleneck The residual unit of the architecture consists of conv1x1、conv3x3、conv1x1 form . first conv1x1 The dimension of input channel is reduced , Reduced the subsequent conv3x3 Amount of computation . final conv1x1 Restore the dimension of the output channel .
ResNeXt
ResNeXt It's an efficient CNN Model , It can be seen as ResNet A special case of , take conv3x3 Replace with a group of conv3x3. By using valid grouping conv, And ResNet comparison ,conv1x1 The channel reduction rate becomes moderate , So as to get better accuracy at the same cost .

MobileNet (Separable Conv)
MobileNet It's a stack of separable convolution modules , from depthwise conv and conv1x1 (pointwise conv) form .

Separable convolution performs convolution independently in space domain and channel domain . This convolution decomposition significantly reduces the amount of computation , from HWNK²M Down to HWNK² (depthwise) + HWNM (conv1x1), HWN(K² + M) . In general ,M>>K( Such as K=3 and M≥32), The reduction rate is about 1/8-1/9.
The important point here is , Calculate the amount of bottleneck Now it is conv1x1!
ShuffleNet
ShuffleNet The motivation is as mentioned above ,conv1x1 It's the bottleneck of separable convolution . although conv1x1 It's already working , There seems to be no room for improvement , grouping conv1x1 Can be used for this purpose !

The figure above illustrates the use of ShuffleNet Module . The important thing here is building block yes channel shuffle layer , In packet convolution, it performs the order of passthrough between groups “shuffles”. without channel shuffle, The output of packet convolution is not utilized between groups , Resulting in a decrease in accuracy .
MobileNet-v2
MobileNet-v2 A similar ResNet With medium bottleneck The module architecture of the residual unit ; Convolution can be separated by depth (depthwise convolution) Instead of conv3x3, It's an improved version of the residual unit .

You can see from above , And standard bottleneck The architecture is the opposite , first conv1x1 Increased channel dimension , And then execute depthwise conv, the last one conv1x1 Reduced channel dimensions .

By means of the above building blocks Reorder , And with MobileNet-v1( Separable conv) Compare , We can see how this architecture works ( This reordering does not change the entire model architecture , because MobileNet-v2 It's a stack of modules ).
in other words , The above module can be regarded as an improved version of separable convolution , In which the single of convolution can be separated conv1x1 It's broken down into two conv1x1. Give Way T The expansion factor representing the channel dimension , Two conv1x1 The amount of calculation is 2HWN²/T , And... Under separable convolution conv1x1 The amount of calculation is HWN². stay [5] in , Use T = 6, take conv1x1 The cost of computing has been reduced 3 times ( It's usually T/2).
FD-MobileNet
Last , Introduce Fast-Downsampling MobileNet (FD-MobileNet)[10]. In this model , And MobileNet comparison , Down sampling is performed in earlier layers . This simple technique can reduce the total computational cost . The reason is the traditional down sampling strategy and the computational cost of separable variables .
from VGGNet Start , Many models use the same down sampling strategy : Perform down sampling , Then double the number of channels in subsequent layers . For standard convolutions , The amount of calculation remains unchanged after down sampling , Because by definition HWNK²M . And for separable variables , After down sampling, the amount of calculation is reduced ; from HWN(K² + M) Reduced to H/2 W/2 2N(K² + 2M) = HWN(K²/2 + M). When M When it's not big ( The earlier layers ), It's relatively dominant .
The following is a summary of the full text

The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- 【深入理解TcaplusDB技术】单据受理之事务执行
- quartus调用&设计D触发器——仿真&时序波验证
- sed -i命令怎么使用
- 5 technical vulnerabilities related to NFT
- What is the reason why maxcompute is sometimes particularly slow to execute SQL queries
- Multi-Camera Detection of Social Distancing Reference Implementation
- Has aaig really awakened its AI personality after reading the global June issue (Part 1)? Which segment of NLP has the most social value? Get new ideas and inspiration ~
- How long is the financial product? Is it better for novices to buy long-term or short-term?
- Cloud native essay deep understanding of ingress
- 栈和队列的基本使用
猜你喜欢

面向 PyTorch* 的英特尔 扩展助力加速 PyTorch

串口、COM、UART、TTL、RS232(485)区别详解

Common usage of OS (picture example)

DBMS in Oracle_ output. put_ How to use line

Dataset之GermanCreditData:GermanCreditData数据集的简介、下载、使用方法之详细攻略

AGCO AI frontier promotion (6.23)

"Four highs" of data midrange stability | startdt Tech Lab 18

腾讯的技术牛人们,是如何完成全面上云这件事儿的?

Broadcast level E1 to aes-ebu audio codec E1 to stereo audio XLR codec

实战监听Eureka client的缓存更新
随机推荐
Androd Gradle模块依赖替换如何使用
How do the top ten securities firms open accounts? Is online account opening safe?
Js: get the maximum zindex (Z-index) value of the page
Quarkus+saas multi tenant dynamic data source switching is simple and perfect
在线文本实体抽取能力,助力应用解析海量文本数据
人脸注册,解锁,响应,一网打尽
First exposure! The only Alibaba cloud native security panorama behind the highest level in the whole domain
理解ADT与OOP
Online text entity extraction capability helps applications analyze massive text data
实战 | 如何制作一个SLAM轨迹真值获取装置?
Broadcast level E1 to aes-ebu audio codec E1 to stereo audio XLR codec
Windows install MySQL
Principle analysis of three methods for exchanging two numbers
ExpressionChangedAfterItHasBeenCheckedError: Expression has changed after it was checked.
Deci 和英特尔如何在 MLPerf 上实现高达 16.8 倍的吞吐量提升和 +1.74% 的准确性提升
在线文本过滤小于指定长度工具
服务稳定性治理
POW consensus mechanism
爱思唯尔-Elsevier期刊的校稿流程记录(Proofs)(海王星Neptune)(遇到问题:latex去掉章节序号)
4k-hdmi optical transceiver 1 channel [email protected] Hdmi2.0 optical transceiver HDMI HD video optical transceiver