当前位置:网站首页>Mxnet implementation of densenet (dense connection network)
Mxnet implementation of densenet (dense connection network)
2022-07-25 13:40:00 【Yinque Guangqian】
Address of thesis :Densely Connected Convolutional Networks
DenseNet Actually, it's the same as the one in front ResNet It's very similar , We know ResNet The gradient of can be directly passed through the identity function ( The output before activating the function is added to the previous cross layer one ) Flow from the back layer to the front layer . But by Sum up Combination , It may hinder the flow of information in the network . therefore DenseNet Made improvements , in other words , The input of each layer will cross layers to each subsequent layer , In other words, each subsequent layer will have direct input from each previous layer , Then they are not added , But in The channel dimension makes a connection . It's easy to understand when we look at the picture intuitively , From the diagram, we can know the model of any layer , Their connection number can be expressed as L(L+1)/2,L Layer number , such as 3 layer , The number of connections is 6,4 The number of connections in the layer is 10; Traditionally, the number of layers is the number of connections .

Because each layer is closely connected with other layers , So for such a model , We call it “ Densely connected network ” Or call “ Dense convolution network ”.
It is also mentioned in the paper “ bottleneck ” Design ( This is in ResNet In the same ) about DenseNet The model is also very effective , Is in the 3x3 Between convolutions, a 1x1 Convolution of , Such a model is called DenseNet-B.
To further improve the compactness of the model , We can reduce the number of characteristic graphs of the transition layer . For example, dense blocks contain m A feature map , Then let the lower transition layer generate θm Output characteristic graphs , among 0<θ≤1,θ For the compression factor . When θ=1, The number of characteristic diagrams across the transition layer remains unchanged , be called DenseNet-C.
If you use both a bottleneck layer and a θ<1 The transition layer , be called DenseNet-BC. about DenseNet-BC Model of , There are very few parameters , And the performance is very good ,0.8M The parameter of is equal to 10.2M Parametric 1001 layer ( Pre activation )ResNet Quite accurate .
A variety of data sets are compared , Especially the closest ResNet Comparison , And you can see that DenseNets Fewer parameters are used , And achieve a lower error rate . Here's the picture :

For the architecture diagram of the entire dense network , as follows : 
Building dense blocks
import d2lzh as d2l
from mxnet import gluon,init,nd
from mxnet.gluon import nn
#ResNet Improved convolution block
#BN--ReLU--3x3 Convolution
def conv_block(num_channels):
blk=nn.Sequential()
blk.add(nn.BatchNorm(),nn.Activation('relu'),nn.Conv2D(num_channels,kernel_size=3,padding=1))
return blk
# Dense blocks
# Multiple conv_block form , Each block uses the same number of channels
# In the forward calculation , Connect the input and output of each block in the channel dimension ( That is, the current block will be linked with all the previous blocks )
class DenseBlock(nn.Block):
def __init__(self,num_convs,num_channels,**kwargs):
super(DenseBlock,self).__init__(**kwargs)
self.net=nn.Sequential()
for _ in range(num_convs):
self.net.add(conv_block(num_channels))
def forward(self,X):
for blk in self.net:
Y=blk(X)
X=nd.concat(X,Y,dim=1)
return X
# Observe the shape change , Especially the number of channels
blk=DenseBlock(4,10)# The number of channels is 10 Of 4 Convolution blocks
blk.initialize()
X=nd.random.uniform(shape=(4,5,22,22))
XX=blk(X)
print(XX.shape)#4*10+5=45
#(4, 45, 22, 22)We can see that the number of channels has increased , If too much, it will make the model complex , Here we use transition layer to deal with , Use one 1x1 Convolution layer to reduce the number of channels , And use steps of 2 The average pool layer reduces the width and height by half , So as to further reduce the complexity of the model .
Transition layer
def transition_block(num_channels):
blk=nn.Sequential()
blk.add(nn.BatchNorm(),nn.Activation('relu'),nn.Conv2D(num_channels,kernel_size=1),
nn.AvgPool2D(pool_size=2,strides=2))
return blk
blk=transition_block(10)
blk.initialize()
print(blk(XX).shape)
#(4, 10, 11, 11)DenseNet Model construction and training
#DenseNet Model
net=nn.Sequential()
net.add(nn.Conv2D(64,kernel_size=7,strides=2,padding=3),
nn.BatchNorm(),nn.Activation('relu'),
nn.MaxPool2D(pool_size=3,strides=2,padding=1))
#num_channels Is the current number of channels , Later, it will be halved through the transition layer ,growth_rate The growth rate is the number of channels of convolution blocks in dense blocks
num_channels,growth_rate=64,32
num_convs_in_dense_blocks=[4,4,4,4]#4 A dense block , In each dense block 4 Convolution layers
for i,num_convs in enumerate(num_convs_in_dense_blocks):
net.add(DenseBlock(num_convs,growth_rate))
# Number of output channels of the last dense block
num_channels+=num_convs*growth_rate
# A transition layer with half the number of channels is added between dense blocks
if i!=len(num_convs_in_dense_blocks)-1:
num_channels //= 2
net.add(transition_block(num_channels))
# Finally, connect the global pooling layer and the full connection layer
net.add(nn.BatchNorm(),nn.Activation('relu'),nn.GlobalAvgPool2D(),nn.Dense(10))
# Training models , Because the model is deep , Wide and high 224 Reduced to 48 To simplify the calculation , Otherwise, a memory overflow error is reported
lr,num_epochs,batch_size,ctx=0.1,5,256,d2l.try_gpu()
net.initialize(force_reinit=True,ctx=ctx,init=init.Xavier())
trainer=gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':lr})
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,resize=48)
d2l.train_ch5(net,train_iter,test_iter,batch_size,trainer,ctx,num_epochs)
'''
epoch 1, loss 0.4878, train acc 0.821, test acc 0.871, time 35.3 sec
epoch 2, loss 0.3063, train acc 0.885, test acc 0.862, time 32.2 sec
epoch 3, loss 0.2618, train acc 0.902, test acc 0.865, time 32.2 sec
epoch 4, loss 0.2367, train acc 0.911, test acc 0.909, time 31.9 sec
epoch 5, loss 0.2146, train acc 0.919, test acc 0.905, time 31.8 sec
'''边栏推荐
猜你喜欢

Based on Baiwen imx6ull_ Ap3216 experiment driven by Pro development board

Nodejs link MySQL error: Er_ NOT_ SUPPORTED_ AUTH_ MODEError: ER_ NOT_ SUPPORTED_ AUTH_ MODE

JS Array indexOf includes sort() 冒号排序 快速排序 去重和随机样本 random

ES6 array de duplication new set()

VIM basic operation summary

Applet H5 get mobile number scheme
![【Platform IO编译Hifive1-revB】*** [.pio\build\hifive1-revb\src\setupGPIO.o] Error 1的解决办法](/img/a1/de197464e2a735d7ef012cea780da5.png)
【Platform IO编译Hifive1-revB】*** [.pio\build\hifive1-revb\src\setupGPIO.o] Error 1的解决办法

C#基础学习(二十三)_窗体与事件

Audio and video technology development weekly | 255

The interviewer asked me: how much do you know about MySQL's storage engine?
随机推荐
0713RHCSA
Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards: Li
Gym安装、调用以及注册
GCD details
0717RHCSA
刷题-洛谷-P1151 子数整数
Turn off automatic update when brew executes commands
刷题-洛谷-P1150 Peter的烟
刷题-洛谷-P1059 明明的随机数
window unbutu20 LTS apt,wget 安装时 DNS 解析错误
埃拉托斯特尼筛法
面试官问我:Mysql的存储引擎你了解多少?
Hcip day 8 notes
VIM basic operation summary
int数组获取重复数据
【力扣】1030.距离顺序排列矩阵单元格
0719RHCSA
0715RHCSA
Blocking queue for concurrent programming
Business visualization - make your flowchart'run'(3. Branch selection & cross language distributed operation node)