当前位置:网站首页>Repoptimizer: it's actually repvgg2
Repoptimizer: it's actually repvgg2
2022-06-25 23:24:00 【Tom Hardy】
Click on the above “ Computer vision workshop ”, choice “ Star standard ”
The dry goods arrive at the first time

The author 丨 zzk
Source GiantPandaCV
Preface
In the design of neural network structure , We often introduce some prior knowledge , such as ResNet Residual structure of . However, we still use the conventional optimizer to train the network .
In this work , We propose to use prior information to modify gradient values , It is called gradient reparameterization , The corresponding optimizer is called RepOptimizer. We focus on VGG The straight cylinder model of , Train to get RepOptVGG Model , He has high training efficiency , Simple and direct structure and extremely fast reasoning speed .
Official warehouse :RepOptimizer
Thesis link :Re-parameterizing Your Optimizers rather than Architectures
And RepVGG The difference between
RepVGG A structural prior is added ( Such as 1x1,identity Branch ), And use the regular optimizer to train . and RepOptVGG It is Add this prior knowledge to the optimizer implementation
Even though RepVGG In the reasoning stage, the branches can be fused , Become a straight tube model . however There are many branches in the training process , Need more memory and training time . and RepOptVGG But really - Straight cylinder model , From the training process is a VGG structure
We do this by customizing the optimizer , The equivalent transformation of structural reparameterization and gradient reparameterization is realized , This transformation is universal , It can be extended to more models
Introducing structural prior knowledge into the optimizer We noticed a phenomenon , In special circumstances , Each branch contains a linearly trainable parameter , Add a constant scaling value , As long as the scaling value is set reasonably , The performance of the model will still be very high . We call this network block Constant-Scale Linear Addition(CSLA)
Let's start with a simple CSLA Start with examples , Consider an input , after 2 A convolution branch + Linear scaling , And added to an output :

We consider equivalent transformation into a branch , The equivalent transformation corresponds to 2 A rule :
Initialization rules
The weight of fusion shall be :

update rule
For the weight after fusion , The update rule is :
For this part of the formula, please refer to appendix A in , There is a detailed derivation
A simple example code is :
import torch
import numpy as np
np.random.seed(0)
np_x = np.random.randn(1, 1, 5, 5).astype(np.float32)
np_w1 = np.random.randn(1, 1, 3, 3).astype(np.float32)
np_w2 = np.random.randn(1, 1, 3, 3).astype(np.float32)
alpha1 = 1.0
alpha2 = 1.0
lr = 0.1
conv1 = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
conv2 = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
conv1.weight.data = torch.nn.Parameter(torch.tensor(np_w1))
conv2.weight.data = torch.nn.Parameter(torch.tensor(np_w2))
torch_x = torch.tensor(np_x, requires_grad=True)
out = alpha1 * conv1(torch_x) + alpha2 * conv2(torch_x)
loss = out.sum()
loss.backward()
torch_w1_updated = conv1.weight.detach().numpy() - conv1.weight.grad.numpy() * lr
torch_w2_updated = conv2.weight.detach().numpy() - conv2.weight.grad.numpy() * lr
print(torch_w1_updated + torch_w2_updated)import torch
import numpy as np
np.random.seed(0)
np_x = np.random.randn(1, 1, 5, 5).astype(np.float32)
np_w1 = np.random.randn(1, 1, 3, 3).astype(np.float32)
np_w2 = np.random.randn(1, 1, 3, 3).astype(np.float32)
alpha1 = 1.0
alpha2 = 1.0
lr = 0.1
fused_conv = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
fused_conv.weight.data = torch.nn.Parameter(torch.tensor(alpha1 * np_w1 + alpha2 * np_w2))
torch_x = torch.tensor(np_x, requires_grad=True)
out = fused_conv(torch_x)
loss = out.sum()
loss.backward()
torch_fused_w_updated = fused_conv.weight.detach().numpy() - (alpha1**2 + alpha2**2) * fused_conv.weight.grad.numpy() * lr
print(torch_fused_w_updated)stay RepOptVGG in , Corresponding CSLA Blocks are RepVGG In the block 3x3 Convolution ,1x1 Convolution ,bn The layer is replaced by With learnable scaling parameters 3x3 Convolution ,1x1 Convolution
Further expand to multi branch , hypothesis s,t Namely 3x3 Convolution ,1x1 Scaling coefficient of convolution , Then the corresponding update rule is :

The first formula corresponds to the input channel == Output channel , There is a total of 3 Branches , Namely identity,conv3x3, conv1x1
The second formula corresponds to the input channel != Output channel , At this time only conv3x3, conv1x1 Two branches
The third formula corresponds to other situations
It should be noted that CSLA No, BN This nonlinear operator during training (training-time nonlinearity), There is no non sequency (non sequential) Trainable parameter ,CSLA Here is just a description RepOptimizer Indirect tools for .
So there's one question left , That is, how to determine the scaling factor
HyperSearch
suffer DARTS inspire , We will CSLA Constant scaling factor in , Replace with trainable parameters . In a small data set ( Such as CIFAR100) Training on , After training on small data , We fix these trainable parameters as constants .
For specific training settings, please refer to the paper
experimental result

The experimental results look very good , There are no multiple branches in the training , Trainable batchsize It can also increase , The throughput of the model is also improved .
Before RepVGG in , Many people roast that it is difficult to quantify , So in RepOptVGG Next , This straight cylinder model is very friendly to quantification :
The code is easy to read
We mainly look at repoptvgg.py This file , The core class is RepVGGOptimizer
stay reinitialize In the method , What it does is repvgg The job of , take 1x1 Convolution weight sum identity Branch into 3x3 The convolution :
if len(scales) == 2:
conv3x3.weight.data = conv3x3.weight * scales[1].view(-1, 1, 1, 1) \
+ F.pad(kernel_1x1.weight, [1, 1, 1, 1]) * scales[0].view(-1, 1, 1, 1)
else:
assert len(scales) == 3
assert in_channels == out_channels
identity = torch.from_numpy(np.eye(out_channels, dtype=np.float32).reshape(out_channels, out_channels, 1, 1))
conv3x3.weight.data = conv3x3.weight * scales[2].view(-1, 1, 1, 1) + F.pad(kernel_1x1.weight, [1, 1, 1, 1]) * scales[1].view(-1, 1, 1, 1)
if use_identity_scales: # You may initialize the imaginary CSLA block with the trained identity_scale values. Makes almost no difference.
identity_scale_weight = scales[0]
conv3x3.weight.data += F.pad(identity * identity_scale_weight.view(-1, 1, 1, 1), [1, 1, 1, 1])
else:
conv3x3.weight.data += F.pad(identity, [1, 1, 1, 1])Then let's take a look at GradientMask Generative logic , If only conv3x3 and conv1x1 Two branches , According to the preceding CSLA Equivalent transformation rule ,conv3x3 Of mask Corresponding to :
mask = torch.ones_like(para) * (scales[1] ** 2).view(-1, 1, 1, 1)and conv1x1 Of mask, You need to multiply by the square of the corresponding scaling factor , And add to conv3x3 middle :
mask[:, :, 1:2, 1:2] += torch.ones(para.shape[0], para.shape[1], 1, 1) * (scales[0] ** 2).view(-1, 1, 1, 1)
If there is Identity Branch , We need to add... To the diagonal 1.0(Identity Branches have no learnable scaling factor )
mask[ids, ids, 1:2, 1:2] += 1.0If you don't understand Identity Why do branches correspond to diagonals , Refer to the author's diagram RepVGG
summary
This article has been out for some time , But it seems that not many people pay attention to . In my opinion, this is a very practical job , Solved the problem of the previous generation RepVGG The small hole left , The model of completely straight cylinder during training is really realized , And quantify , Pruning friendly , Very suitable for actual deployment .
This article is only for academic sharing , If there is any infringement , Please contact to delete .
Dry goods download and learning
The background to reply : Barcelo that Autonomous University courseware , You can download the precipitation of foreign universities for several years 3D Vison High quality courseware
The background to reply : Computer vision Books , You can download it. 3D Classic books in the field of vision pdf
The background to reply :3D Visual courses , You can learn 3D Excellent courses in the field of vision
Computer vision workshop boutique course official website :3dcver.com
1. Multi sensor data fusion technology for automatic driving field
2. For the field of automatic driving 3D Whole stack learning route of point cloud target detection !( Single mode + Multimodal / data + Code )
3. Thoroughly understand the visual three-dimensional reconstruction : Principle analysis 、 Code explanation 、 Optimization and improvement
4. China's first point cloud processing course for industrial practice
5. laser - Vision -IMU-GPS The fusion SLAM Algorithm sorting and code explanation
6. Thoroughly understand the vision - inertia SLAM: be based on VINS-Fusion The class officially started
7. Thoroughly understand based on LOAM Framework of the 3D laser SLAM: Source code analysis to algorithm optimization
8. Thorough analysis of indoor 、 Outdoor laser SLAM Key algorithm principle 、 Code and actual combat (cartographer+LOAM +LIO-SAM)
10. Monocular depth estimation method : Algorithm sorting and code implementation
11. Deployment of deep learning model in autopilot
12. Camera model and calibration ( Monocular + Binocular + fisheye )
13. blockbuster ! Four rotor aircraft : Algorithm and practice
14.ROS2 From entry to mastery : Theory and practice
15. The first one in China 3D Defect detection tutorial : theory 、 Source code and actual combat
blockbuster ! Computer vision workshop - Study Communication group Established
Scan the code to add a little assistant wechat , You can apply to join 3D Visual workshop - Academic paper writing and contribution WeChat ac group , Aimed at Communication Summit 、 Top issue 、SCI、EI And so on .
meanwhile You can also apply to join our subdivided direction communication group , At present, there are mainly ORB-SLAM Series source code learning 、3D Vision 、CV& Deep learning 、SLAM、 Three dimensional reconstruction 、 Point cloud post processing 、 Autopilot 、CV introduction 、 Three dimensional measurement 、VR/AR、3D Face recognition 、 Medical imaging 、 defect detection 、 Pedestrian recognition 、 Target tracking 、 Visual products landing 、 The visual contest 、 License plate recognition 、 Hardware selection 、 Depth estimation 、 Academic exchange 、 Job exchange Wait for wechat group , Please scan the following micro signal clustering , remarks :” Research direction + School / company + nickname “, for example :”3D Vision + Shanghai Jiaotong University + quietly “. Please note... According to the format , Otherwise, it will not pass . After successful addition, relevant wechat groups will be invited according to the research direction . Original contribution Please also contact .

▲ Long press and add wechat group or contribute

▲ The official account of long click attention
3D Vision goes from entry to mastery of knowledge : in the light of 3D In the field of vision Video Course cheng ( 3D reconstruction series 、 3D point cloud series 、 Structured light series 、 Hand eye calibration 、 Camera calibration 、 laser / Vision SLAM、 Automatically Driving, etc )、 Summary of knowledge points 、 Introduction advanced learning route 、 newest paper Share 、 Question answer Carry out deep cultivation in five aspects , There are also algorithm engineers from various large factories to provide technical guidance . meanwhile , The planet will be jointly released by well-known enterprises 3D Vision related algorithm development positions and project docking information , Create a set of technology and employment as one of the iron fans gathering area , near 4000 Planet members create better AI The world is making progress together , Knowledge planet portal :
Study 3D Visual core technology , Scan to see the introduction ,3 Unconditional refund within days

There are high quality tutorial materials in the circle 、 Answer questions and solve doubts 、 Help you solve problems efficiently
Feel useful , Please give me a compliment ~
边栏推荐
- 多模态数据也能进行MAE?伯克利&谷歌提出M3AE,在图像和文本数据上进行MAE!最优掩蔽率可达75%,显著高于BERT的15%...
- NLP text summary: use the pre training model to perform text summary tasks [transformers:pipeline, T5, Bart, Pegasus]
- Multithreaded learning 2- call control
- 做接口测试,这3种工具到底什么时候用?
- .sql数据库导入错误:/*!40101 SET @[email protected]@COLLATION_CONNECTION */
- UE4\UE5 蓝图节点Delay与Retriggerable Delay的使用与区别
- The Ping class of unity uses
- Fastjson反序列化随机性失败
- 小程序绘制一个简单的饼图
- 一位博士在华为的22年
猜你喜欢

Why is the frame rate calculated by opencv wrong?

电路模块分析练习5(电源)

Unity的Ping類使用

Utilisation de la classe Ping d'Unity

UE4_UE5結合offline voice recognition插件做語音識別功能
Why is BeanUtils not recommended?

万亿热钱砸向太空经济,真的是一门好生意?
![[eosio] eos/wax signature error is_ Canonical (c): signature is not canonical](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[eosio] eos/wax signature error is_ Canonical (c): signature is not canonical

1281_FreeRTOS_vTaskDelayUntil实现分析
![[opencv450 samples] inpaint restores the selected region in the image using the region neighborhood](/img/36/8ad6034473382f66f315eb70440711.png)
[opencv450 samples] inpaint restores the selected region in the image using the region neighborhood
随机推荐
Fegin client entry test
pdm的皮毛
pdm导入vscode的实现方式
ES6-- 集合
zabbix_server配置文件详解
Es6-- set
ES6-Const常量与数组解构
Huawei cloud SRE deterministic operation and maintenance special issue (the first issue)
Equivalence class, boundary value, application method and application scenario of scenario method
Oracle - 基本入门
Idea common plug-ins
剑指 Offer 46. 把数字翻译成字符串(DP)
NLP pre training model-2018:bert dictionary
实战:typora里面如何快捷改变字体颜色(博客分享-完美)-2022.6.25(已解决)
等价类,边界值,场景法的使用方法和运用场景
Pit resolution encountered using East OCR (compile LAMS)
首个大众可用PyTorch版AlphaFold2复现,哥大开源OpenFold,star量破千
ORACLE - 数据查询
Applets - view and logic
【ModuleBuilder】GP服务实现SDE中两个图层相交选取