当前位置:网站首页>Transpose convolution explanation

Transpose convolution explanation

2022-06-24 16:07:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm your friend, Quan Jun .

Transpose convolution explanation

   The previous article explained convolution , I feel that now that I have been reorganized , Just sort out the series concept as a whole , It can also be regarded as showing off all the things you know .    Transposition convolution (Transposed Convolution) It was later called , At first we all call it deconvolution / deconvolution (Deconvolution), This concept is proposed in the task of image segmentation , Image segmentation requires pixel by pixel operation , Do a segmentation for each pixel , Classify them into different objects .    It is natural for us to use convolutional neural network to complete this task , Then we have to use convolution neural network to extract features , But there are two main components in convolutional neural networks , Convolution layer and down sampling layer will reduce the size of the image . This is not consistent with the pixel by pixel classification , Because pixel by pixel segmentation requires that the output and input sizes be consistent .    In response to this question , It is proposed to use convolution kernel to extract features layer by layer , Then the feature map is gradually restored to the size of the original map by up sampling . The upsampling is realized by deconvolution at the beginning . If we say that the process characteristic graph of sampling under convolution kernel is smaller , Then the characteristic graph should become larger after upsampling .    We should be familiar with the output size formula of convolution o u t = ( F − K + 2 P ) / s + 1 out=(F-K+2P)/s+1 out=(F−K+2P)/s+1, among F Indicates the size of the input feature drawing ,K Represents the size of the convolution kernel ,P Express padding,S Represents the step size of the convolution . We all use this formula to calculate the size of the output characteristic graph of convolution . Give an example of , One 4×4 Input characteristic diagram of , Convolution kernels for 3×3, If not used paddng, In steps of 1, Then it will be carried into the calculation o u t = ( 4 − 3 ) / 1 + 1 out=(4-3)/1+1 out=(4−3)/1+1 by 2.    We've been im2col The implementation of convolution is explained in the introduction of algorithm , In fact, this step is accomplished by multiplying two matrices , We might as well remember as y = C x y=Cx y=Cx, If you want to upsample , We want to multiply the output characteristic graph by a parameter matrix , Then restore the size , According to the knowledge of Mathematics , We give the characteristic graph matrix y y y Take a left {C^T}, You can get C T y = C T C x C^Ty=C^TCx CTy=CTCx, C C C The number of columns is equal to x x x The number of rows , C T C C^TC CTC The number of rows and columns of are equal to x The number of rows , After finishing , The result is the sum of x x x The same shape . This is the source of transposed convolution names . Some of the work is really done in this way .    We can also naturally draw a conclusion , We do not need to left multiply the output characteristic graph C T C^T CT, Obviously, as long as it has the same shape as this matrix , The output result is the same size as the original feature map , And this operation can also be realized by convolution , Then we just need to make sure that the shape is consistent , Then we can train by ourselves , The problem of this size has been solved , And the correspondence of features also has , It can be trained , Kill two birds with one stone . im2col Content of explanation , Convolution is ( C o u t , C i n ∗ K h ∗ K w ) (C_{out},C_{in}*K_h*K_w) (Cout​,Cin​∗Kh​∗Kw​) Convolution kernel multiplication of ( C i n ∗ K h ∗ K w , H N ∗ W N ) (C_{in}*K_h*K_w,H_N*W_N) (Cin​∗Kh​∗Kw​,HN​∗WN​) Characteristic graph , obtain ( C o u t , H N ∗ W N ) (C_{out},H_N*W_N) (Cout​,HN​∗WN​) Result . Now do a transpose on the convolution kernel ( C i n ∗ K h ∗ K w , C o u t ) (C_{in}*K_h*K_w,C_{out}) (Cin​∗Kh​∗Kw​,Cout​) ride ( C o u t , H N ∗ W N ) (C_{out},H_N*W_N) (Cout​,HN​∗WN​) Get one ( C i n ∗ K h ∗ K w , H N ∗ W N ) (C_{in}*K_h*K_w,H_N*W_N) (Cin​∗Kh​∗Kw​,HN​∗WN​) Characteristic graph .    In addition to the above, here are some other things that need to be added , For example caffe In addition to im2col Outside of the function , Another function is col2im, That is to say im2col The inverse operation of . So for the above results caffe It's through col2im To convert into a characteristic graph . however col2im Function for im2col Just the inverse function of the shape , in fact , If you execute... For a feature graph first im2col Re execution col2im The result obtained is not equal to the original .    And in the tensorflow and pytorch in , This is different , Both are transpose convolution operations based on eigengraph expansion , Both of them inflate the feature graph by filling , There may be another one after that crop operation . The reason why you need to fill , Because we want to realize transpose convolution directly through convolution operation , Just fill in some values , In this way, the size of the convoluted feature map is naturally larger .    But both of them can not restore the original convolution , It's just a shape restoration .    Finally, we can discuss the calculation of shape , Transpose convolution is the shape inverse operation of convolution , So the shape calculation is the inverse function of the original calculation method . o u t = ( F − K + 2 P ) / s + 1 out=(F-K+2P)/s+1 out=(F−K+2P)/s+1 The inverse function of this function is solved inversely , among mod Is the division of the upper formula s The remainder of .

   This formula is used to calculate the output of deconvolution , because mod It's in addition to s Remainder obtained , in other words mod Less than s Of , When s=1 I can only be 0, A solution , When s > 1 s>1 s>1 when , There will be multiple solutions , This is the time , After specifying the parameters ,pytorch and tensorflow The feature map will be filled around , Then do convolution . There is a multiplication in this formula s, How does this work , Is to expand the characteristic graph . In the characteristic diagram, two cell Between , Fill in s-1 It's worth , After expansion, the characteristic graph will become ( o u t − 1 ) s + 1 (out-1)s+1 (out−1)s+1, At this time, it is necessary to base on mod To fill the outside . Make a convolution , Then, in fact, it will continue to fill according to the output size of the feature map , This calculation is more complicated , But my main purpose is to make inflation clear here , Here is a diagram to illustrate this expansion , As shown below s=2 The expansion of .

   You can combine this formula with tf perhaps pytroch The parameters of the interface of correspond to , Because I don't want to talk about interfaces , So I won't repeat it here . For further understanding , I also drew the following picture .

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/151928.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241545217176.html