当前位置:网站首页>Pytorch structure reparameterization repvggblock
Pytorch structure reparameterization repvggblock
2022-07-25 00:40:00 【Hebi tongzj】
stay ShuffleNet v2 Proposed the lightweight network 4 Large design criteria :
- When the input and output channels are the same ,MAC Minimum
- FLOPs Phase at the same time , If the number of packets is too large, the packet convolution will increase MAC
- Fragmentation operation ( Multi branch structure ) Unfriendly to parallel acceleration
- The memory and time-consuming brought by element by element operation can not be ignored
In recent years , The structure of convolutional neural network has become more and more complex ; Thanks to the good convergence ability of multi branch structure , Multi branch structures are becoming more and more popular
however , When using multi branch structure , On the one hand, parallel acceleration cannot be effectively utilized , On the other hand, it increases MAC

In order to make the simple structure reach the same accuracy as the multi branch structure , In the training RepVGG When using multi branch structure (3×3 Convolution + 1×1 Convolution + Identity mapping ), With the help of its good convergence ability ; In reasoning 、 During deployment, the multi branch structure is transformed into a single path structure by using re parameterization Technology , With the help of simple structure, the ultimate speed

Reparameterization
In the multi branch structure used in training , There is one in every branch BN layer
BN Layer has four parameters used in operation :mean、var、weight、bias, For input x Perform the following transformation :

Turn into
In the form of :

import torch
from torch import nn
class BatchNorm(nn.BatchNorm2d):
def unpack(self):
mean, weight, bias = self.running_mean, self.weight, self.bias
std = (self.running_var + self.eps).sqrt()
eq_weight = weight / std
eq_bias = bias - weight * mean / std
return eq_weight, eq_bias
bn = BatchNorm(8).eval()
# Initialize random parameters
bn.running_mean.data, bn.running_var.data, bn.weight.data, bn.bias.data = torch.rand([4, 8])
image = torch.rand([1, 8, 1, 1])
print(bn(image).view(-1))
# take BN Parameters of are converted to w, b form
weight, bias = bn.unpack()
print(image.view(-1) * weight + bias)because BN The layer will fit the offset of each channel , So the convolution layer and BN When layers are connected together , Convolution layer does not use bias , Its operation can be expressed as :


so , Convolution layer and the BN Layer can be equivalent to a convolution layer with offset

And identity mapping can also be equivalent to 1×1 Convolution :
- about nn.Conv2d(c1, c2, kernel_size=1), Of its parameters shape by [c2, c1, 1, 1] —— Can be regarded as [c2, c1] The linear layer of , To perform channel transformation of each pixel ( Reference resources :PyTorch Two dimensional multi-channel convolution operation )
- When c1 = c2、 And when this linear layer is a unit matrix , Equivalent to identity mapping
1×1 Convolution can be achieved by filling 0 Expressed as 3×3 Convolution , Therefore, the calculation of the multi branch structure can be expressed as :




Thus, it can be equivalent to a new 3×3 Convolution ( This conclusion can also be extended to grouping convolution 、5×5 Convolution )
stay NVIDIA 1080Ti Speed test on , With [32, 2048, 56, 56] The image input convolution kernel of gets the output of the same channel and the same size ,3×3 Convolution has the most floating-point operations per second

Structural recurrence
Reference code :https://github.com/DingXiaoH/RepVGG
I refactor the source code in the paper , The purpose is to enhance its readability 、 Ease of use ( In order to be portable YOLO project , In addition to the L2 Calculation of norm )
meanwhile , I also write reparameterized functions into static methods of classes , Support the re parameterization of the integrated model
from collections import OrderedDict
import torch
import torch.nn.functional as F
from torch import nn
class BatchNorm(nn.BatchNorm2d):
def unpack(self):
mean, weight, bias = self.running_mean, self.weight, self.bias
std = (self.running_var + self.eps).sqrt()
eq_weight = weight / std
eq_bias = bias - weight * mean / std
return eq_weight, eq_bias
class RepVGGBlock(nn.Module):
def __init__(self, c1, c2, k=3, s=1, g=1, deploy=False):
super(RepVGGBlock, self).__init__()
self.deploy = deploy
# Check the size of convolution kernel
assert k & 1, 'The convolution kernel size must be odd'
# Main branch convolution parameter
self.conv_main_config = dict(
in_channels=c1, out_channels=c2, kernel_size=k,
stride=s, padding=k // 2, groups=g
)
if deploy:
self.conv_main = nn.Conv2d(**self.conv_main_config, bias=True)
else:
# The main branch
self.conv_main = nn.Sequential(OrderedDict(
conv=nn.Conv2d(**self.conv_main_config, bias=False),
bn=BatchNorm(c2)
))
# 1×1 Convolution branch
self.conv_1x1 = nn.Sequential(OrderedDict(
conv=nn.Conv2d(c1, c2, 1, s, padding=0, groups=g, bias=False),
bn=BatchNorm(c2)
)) if k != 1 else None
# Identity mapping branch
self.identity = BatchNorm(c2) if c1 == c2 and s == 1 else None
def forward(self, x, act=F.silu):
y = self.conv_main(x)
if self.conv_1x1:
y += self.conv_1x1(x)
if self.identity:
y += self.identity(x)
# Use activation function
y = act(y) if act else y
return y
@staticmethod
def merge(model: nn.Module):
# Query all sub models of the model , Yes RepVGGBlock A merger
for m in model.modules():
if isinstance(m, RepVGGBlock) and not m.deploy:
# Main branch information
kernel = m.conv_main.conv.weight
(c2, c1_per_group, k, _), g = kernel.shape, m.conv_main.conv.groups
center_pos = k // 2
# Convert main branch
bn_weight, bn_bias = m.conv_main.bn.unpack()
kernel_weight, kernel_bias = kernel * bn_weight.view(-1, 1, 1, 1), bn_bias
# transformation 1×1 Convolution branch
if m.conv_1x1:
kernel_1x1 = m.conv_1x1.conv.weight[..., 0, 0]
bn_weight, bn_bias = m.conv_1x1.bn.unpack()
kernel_weight[..., center_pos, center_pos] += kernel_1x1 * bn_weight.view(-1, 1)
kernel_bias += bn_bias
# Transformation identity mapping branch
if m.identity:
kernel_id = torch.cat([torch.eye(c1_per_group)] * g, dim=0).to(kernel.device)
bn_weight, bn_bias = m.identity.unpack()
kernel_weight[..., center_pos, center_pos] += kernel_id * bn_weight.view(-1, 1)
kernel_bias += bn_bias
# Declare the combined convolution kernel
m.conv_main = nn.Conv2d(**m.conv_main_config, bias=True)
m.conv_main.weight.data, m.conv_main.bias.data = kernel_weight, kernel_bias
# Delete the merged Branch
m.deploy = True
delattr(m, 'conv_1x1')
delattr(m, 'identity')
m.conv_1x1, m.identity = None, NoneThen design an integration model to verify :
- merge Whether the function changes the network structure
- Before and after re parameterization , Whether the calculation results of the model are consistent
- After reparameterization , Whether the reasoning speed of the model has been improved
if __name__ == '__main__':
class RepVGG(nn.Module):
def __init__(self, num_blocks, num_classes=1000, width_multiplier=None, deploy=False):
super(RepVGG, self).__init__()
assert len(width_multiplier) == 4
self.deploy = deploy
# Enter the number of channels
self.in_planes = min(64, int(64 * width_multiplier[0]))
self.stage0 = RepVGGBlock(3, self.in_planes, k=3, s=2, deploy=self.deploy)
# The trunk is divided into four parts , Each part uses multiple RepVGGBlock cascade
self.stage1 = self._make_stage(int(64 * width_multiplier[0]), num_blocks[0], stride=2)
self.stage2 = self._make_stage(int(128 * width_multiplier[1]), num_blocks[1], stride=2)
self.stage3 = self._make_stage(int(256 * width_multiplier[2]), num_blocks[2], stride=2)
self.stage4 = self._make_stage(int(512 * width_multiplier[3]), num_blocks[3], stride=2)
self.gap = nn.AdaptiveAvgPool2d(output_size=1)
self.linear = nn.Linear(int(512 * width_multiplier[3]), num_classes)
def _make_stage(self, planes, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
blocks = []
for stride in strides:
blocks.append(RepVGGBlock(self.in_planes, planes, k=3, s=stride, deploy=self.deploy))
self.in_planes = planes
return nn.Sequential(*blocks)
def forward(self, x):
out = self.stage0(x)
out = self.stage1(out)
out = self.stage2(out)
out = self.stage3(out)
out = self.stage4(out)
out = self.gap(out)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
vgg = RepVGG(num_blocks=[1, 1, 1, 1], num_classes=20,
width_multiplier=[1, 1, 1, 1]).eval()
print(vgg)
# by BatchNorm Initialize random parameters
for m in vgg.modules():
if isinstance(m, BatchNorm):
m.running_mean.data, m.running_var.data, \
m.weight.data, m.bias.data = torch.rand([4, m.num_features])
image = torch.rand([1, 3, 224, 224])
class Timer:
prefix = 'Cost: '
def __init__(self, fun, *args, **kwargs):
import time
start = time.time()
fun(*args, **kwargs)
cost = (time.time() - start) * 1e3
print(self.prefix + f'{cost:.0f} ms')
# Using training structures VGG To test
print(vgg(image))
Timer(vgg, image)
# call RepVGGBlock Static method of , Merge RepVGGBlock The branch of
RepVGGBlock.merge(vgg)
print(vgg)
# Using reasoning structure VGG To test
print(vgg(image))
Timer(vgg, image)RepVGG(
(stage0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Sequential(
(conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_1x1): Sequential(
(conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(bn): BatchNorm(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 22 ms
RepVGG(
(stage0): RepVGGBlock(
(conv_main): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
(stage1): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage2): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage3): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(stage4): Sequential(
(0): RepVGGBlock(
(conv_main): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
)
)
(gap): AdaptiveAvgPool2d(output_size=1)
(linear): Linear(in_features=512, out_features=20, bias=True)
)
tensor([[-0.1108, 0.0824, 0.5547, -0.1671, 0.7442, -0.1164, -0.2825, 0.4088,
0.1239, -0.3792, 0.1152, -0.4021, 0.4034, 0.2350, 0.2601, -0.1197,
0.2462, -0.2451, 0.0439, -0.2507]], grad_fn=<AddmmBackward>)
Cost: 14 ms
边栏推荐
- Detailed explanation of the usage of vector, queue and stack
- The use of Multimeter in circuit analysis experiment of Shandong University
- Solution to the shortest Hamilton path problem
- Internal network mapping port to external network
- [mindspore] [xception model] script statement is suspected to be wrong
- 分页的相关知识
- [英雄星球七月集训LeetCode解题日报] 第24日 线段树
- 如果实现与在线CAD图中的线段实时求交点
- Notes on topic brushing (XXII) -- Dynamic Planning: basic ideas and topics
- ROS manipulator movelt learning notes 3 | kinect360 camera (V1) related configuration
猜你喜欢

Heap and stack in embedded development

自动化测试系列-Selenium三种等待详解
![[leetcode weekly replay] 303rd weekly 20220724](/img/ba/0f16f1f42e4a2593ec0124f23b30d7.png)
[leetcode weekly replay] 303rd weekly 20220724

Daily question 1 · 1260. Two dimensional network migration · simulation

Advanced function of postman
![[untitled]](/img/70/5db8a8df63a3fd593acf7f69640698.png)
[untitled]
![[performance optimization] MySQL common slow query analysis tools](/img/97/c7604282544c5fa8cd7e85a6ab9154.png)
[performance optimization] MySQL common slow query analysis tools

动态规划-01背包滚动数组优化

C language: deep analysis function stack frame

Netease game Flink SQL platform practice
随机推荐
WPF implements RichTextBox keyword query highlighting
Vscode installation and configuration
ROS manipulator movelt learning notes 3 | kinect360 camera (V1) related configuration
Nodejs package
Install software on kubernetes cluster using helm 3 package manager
Educational codeforces round 89 (rated for Div. 2) ABC problem solution
The troubleshooting process of a segment error (disassembly address troubleshooting)
Use es to realize fuzzy search and search recommendation of personal blog
Principle of data proxy
Related knowledge of paging
Redis 事务学习有感
360 interview summary 2013 campus recruitment 2012-4-4
[mindspore ascend] [running error] graph_ In mode, run the network to report an error
Why do I have to clean up data?
Codeworks round 649 (Div. 2) ABC problem solution
[mindspore] [mode] spontaneous_ The difference between mode and graph mode
2022 Henan Mengxin League game 2: Henan University of technology I - 22
数组中只出现一次的两个数字
The leftmost prefix principle of MySQL
Install and configure php5-7 version under centos7.4