当前位置:网站首页>Transpose convolution learning notes

Transpose convolution learning notes

2022-06-24 16:06:00 Full stack programmer webmaster

Hello everyone , I meet you again , I'm your friend, Quan Jun .

List of articles

1. Transpose convolution definition

In the prediction process of semantic segmentation , We need to detect each pixel , Then there is a problem , First, we compress the input image by two-dimensional convolutional neural network , Finally, we get a prediction , But if we need to recognize each pixel , It is necessary to deduce the category in each pixel by prediction . for instance , When we recognize cats and dogs , We don't just have to identify where the cat is , Also identify each pixel about the cat , So we need to use transpose convolution . Transpose convolution can make the image larger and larger , Make the generated image have the same size as the original image , Then we can easily perform semantic segmentation .

2. Custom transpose convolution

The concrete is , With convolution kernel K=torch.tensor([[0,1],[2,3]]), Keep following every element in the input x i , j x_{i,j} xi,j​ Multiply , Finally, all the elements are added to get the output convolution , The details are shown in the figure above .

  • Code
#  Import database 
# 1. Import database 
import torch
from torch import nn

# 2.  Define the input matrix  x
x = torch.Tensor([[0, 1], [2, 3]])
k = torch.Tensor([[0, 1], [2, 3]])


# 3.  Define transpose convolution function 
def tran_conv(x, k):
	h, w = k.shape
	y = torch.zeros((x.shape[0] + h - 1, x.shape[1] + w - 1))
	for i in range(x.shape[0]):
		for j in range(x.shape[1]):
			y[i:i + h, j:j + w] += x[i, j] * k
	return y

# 4. Define the input tensor  X , Transposed convolution kernel  K
X = torch.Tensor([[0, 1], [2, 3]])
K = torch.Tensor([[0, 1], [2, 3]])

# 5. Output  Y
Y = tran_conv(X, K)
print(f'Y={ 
     Y}')

# 6.  take  X,K  It becomes a four-dimensional tensor , Convenient convolution calculation 
X_conv = X.reshape(1, 1, 2, 2)
K_conv = K.reshape(1, 1, 2, 2)

# 7.  Define two-dimensional transpose convolution operation , Input channel 1, Output channel 1, Convolution kernel  K  by  2 X 2 , Unbiased 
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)

# 8.  take   The value of the transposed convolution kernel is assigned to  K_conv
tconv.weight.data = K_conv

# 9.  Input tensor (X_conv) ->  Transposition convolution (tconv+K_conv) ->  Output tensor 
Y_conv = tconv(X_conv)

# 10.  To check whether the calculation result is consistent with our customized value , Batch to 1 Remove the dimension of 
Y_conv_squeeze = Y_conv.squeeze()
print(f'Y_conv={ 
     Y_conv}')
print(f'Y_conv_squeeze={ 
     Y_conv_squeeze}')

# 11. Judge whether the value of the user-defined transpose convolution function is consistent with the value of the official calling function 
print(f'Y == Y_conv_squeeze:{ 
     Y == Y_conv_squeeze}')
  • result
Y=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])
Y_conv=tensor([[[[ 0.,  0.,  1.],
          [ 0.,  4.,  6.],
          [ 4., 12.,  9.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_squeeze=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]], grad_fn=<SqueezeBackward0>)
Y == Y_conv_squeeze:tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

3. Transposition convolution

  • padding: Acting in the output tensor , Subtract... From rows and columns padding Row and column
  • stride: In the intermediate matrix
  • Code
# 1. Import database 
import torch
from torch import nn

# 2.  Define the input matrix  x
x = torch.Tensor([[0, 1], [2, 3]])
k = torch.Tensor([[0, 1], [2, 3]])


# 3.  Define transpose convolution function 
def tran_conv(x, k):
	h, w = k.shape
	y = torch.zeros((x.shape[0] + h - 1, x.shape[1] + w - 1))
	for i in range(x.shape[0]):
		for j in range(x.shape[1]):
			y[i:i + h, j:j + w] += x[i, j] * k
	return y


# 4. Define the input tensor  X , Transposed convolution kernel  K
X = torch.Tensor([[0, 1], [2, 3]])
K = torch.Tensor([[0, 1], [2, 3]])

# 5. Output  Y
Y = tran_conv(X, K)
print(f'Y={ 
     Y}')

# 6.  take  X,K  It becomes a four-dimensional tensor , Convenient convolution calculation 
X_conv = X.reshape(1, 1, 2, 2)
K_conv = K.reshape(1, 1, 2, 2)

# 7.  Define two-dimensional transpose convolution operation , Input channel 1, Output channel 1, Convolution kernel  K  by  2 X 2 , Unbiased 
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)

# 8.  take   The value of the transposed convolution kernel is assigned to  K_conv
tconv.weight.data = K_conv

# 9.  Input tensor (X_conv) ->  Transposition convolution (tconv+K_conv) ->  Output tensor 
Y_conv = tconv(X_conv)

# 10.  To check whether the calculation result is consistent with our customized value , Batch to 1 Remove the dimension of 
Y_conv_squeeze = Y_conv.squeeze()
print(f'Y_conv={ 
     Y_conv}')
print(f'Y_conv_squeeze={ 
     Y_conv_squeeze}')

# 11. Judge whether the value of the user-defined transpose convolution function is consistent with the value of the official calling function 
print(f'Y == Y_conv_squeeze:{ 
     Y == Y_conv_squeeze}')

# 12.padding = 1 , In transpose convolution, the output is subtracted padding=1 Rows and columns of 
tconv_padding_1 = nn.ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)
tconv_padding_1.weight.data = K_conv
Y_conv_padding_1 = tconv_padding_1(X_conv)
print(f'Y_conv_padding_1={ 
     Y_conv_padding_1}')

# 13.stride = 2 ,  Transpose convolution is a step expansion in the middle , Convolution kernel K In the input X Sliding in two steps ,
#  Input is  [2,2], Transposed convolution kernel [2,2],stride=2  The output  Y  The size is  4 = (2-1)*2+2-1+1
tconv_stride_2 = nn.ConvTranspose2d(1, 1, kernel_size=2, stride=2, bias=False)
tconv_stride_2.weight.data = K_conv
Y_conv_stride_2 = tconv_stride_2(X_conv)
print(f'Y_conv_stride_2={ 
     Y_conv_stride_2}')
  • result
Y=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])
Y_conv=tensor([[[[ 0.,  0.,  1.],
          [ 0.,  4.,  6.],
          [ 4., 12.,  9.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_squeeze=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]], grad_fn=<SqueezeBackward0>)
Y == Y_conv_squeeze:tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])
Y_conv_padding_1=tensor([[[[4.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_stride_2=tensor([[[[0., 0., 0., 1.],
          [0., 0., 2., 3.],
          [0., 2., 0., 3.],
          [4., 6., 6., 9.]]]], grad_fn=<SlowConvTranspose2DBackward>)

4. Think about transposed convolution from the perspective of convolution [ a key ]

4.1 explain

Transpose convolution is a kind of convolution

  • It rearranges the inputs and cores
  • The same convolution is generally different from the down sampling , It is usually used as up sampling
  • If convolution will be input from (h,w) become (h’,w’), Under the same super parameter, it will (h’,w’) become (h,w)

4.2 Fill in with 0, The stride is 1

4.3 Fill in with p, The stride is 1

4.4 Fill in with p, The stride is s

5. Initialization of transpose convolution

Transpose convolution is the same as ordinary convolution , We all need to initialize the convolution kernel , Bilinear interpolation is commonly used to initialize convolution kernel. The specific code is as follows ;

  • Code
# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: os_test
# @Create time: 2022/1/4 8:38


import torch


def bilinear_kernel(in_channels, out_channels, kernel_size):
	""" :param in_channels:  Enter the number of channels  :param out_channels:  Number of output channels  :param kernel_size:  Convolution kernel size  :return: """
	# factor = 2
	factor = (kernel_size + 1) // 2
	# center = 1.5
	if kernel_size % 2 == 1:
		center = factor - 1
	else:
		center = factor - 0.5
	#  Create a Yuanzu  og[0] = tensor[4,1];og[1]=tensor[1,4]
	og = (torch.arange(kernel_size).reshape(-1, 1),
		  torch.arange(kernel_size).reshape(1, -1))
	#  Perform interpolation calculation , Generate  4 x 4  Matrix 
	filt = (1 - torch.abs(og[0] - center) / factor) * (1 - torch.abs(og[1] - center) / factor)
	#  Generate an all for 0 Matrix  (in_chanels,out_channels,kernel_size,kernel_size)
	#  Will be initialized  filt  The value is put into  weight  Inside 
	weight = torch.zeros((in_channels, out_channels,
						  kernel_size, kernel_size))
	# weight The shape of the , take filt The overall value is assigned diagonally 
	# [[filt],[0],[0]]
	# [0],[filt],[0]
	# [0],[0],[filt]]
	weight[range(in_channels), range(out_channels), :, :] = filt
	return weight


# y :[3,3,4,4]
y = bilinear_kernel(3, 3, 4)

print(f'y={ 
     y}')
print(f'y_shape={ 
     y.shape}')
print(f'y0={ 
     y[0]}')
print(f'y1={ 
     y[1]}')
print(f'y2={ 
     y[2]}')
  • result
y=tensor([[[[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]]],


        [[[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]]],


        [[[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]]]])
y_shape=torch.Size([3, 3, 4, 4])
y0=tensor([[[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
y1=tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
y2=tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]]])

6. Transpose convolution image application

We can double the height and width of an image by transpose convolution .

  • Code
# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: os_test
# @Create time: 2022/1/4 8:38
import os
import matplotlib.pyplot as plt
import torch
import torchvision.transforms
from torch import nn
from d2l import torch as d2l


def bilinear_kernel(in_channels, out_channels, kernel_size):
	""" :param in_channels:  Enter the number of channels  :param out_channels:  Number of output channels  :param kernel_size:  Convolution kernel size  :return: """
	# factor = 2
	factor = (kernel_size + 1) // 2
	# center = 1.5
	if kernel_size % 2 == 1:
		center = factor - 1
	else:
		center = factor - 0.5
	#  Create a Yuanzu  og[0] = tensor[4,1];og[1]=tensor[1,4]
	og = (torch.arange(kernel_size).reshape(-1, 1),
		  torch.arange(kernel_size).reshape(1, -1))
	#  Perform interpolation calculation , Generate  4 x 4  Matrix 
	filt = (1 - torch.abs(og[0] - center) / factor) * (1 - torch.abs(og[1] - center) / factor)
	#  Generate an all for 0 Matrix  (in_chanels,out_channels,kernel_size,kernel_size)
	#  Will be initialized  filt  The value is put into  weight  Inside 
	weight = torch.zeros((in_channels, out_channels,
						  kernel_size, kernel_size))
	# weight The shape of the , take filt The overall value is assigned diagonally 
	# [[filt],[0],[0]]
	# [0],[filt],[0]
	# [0],[0],[filt]]
	weight[range(in_channels), range(out_channels), :, :] = filt
	return weight


conv_trans = nn.ConvTranspose2d(3, 3, kernel_size=4, padding=1, stride=2, bias=False)
conv_trans.weight.data.copy_(bilinear_kernel(3, 3, 4))
path = os.path.join(os.getcwd(), 'img', 'banana.jpg')# path='D:\\zc\\img\\banana.jpg'
print(f'path={ 
     path}')
# in_img = (3,256,256)
in_img = torchvision.transforms.ToTensor()(d2l.Image.open(path))
# X = (1,3,256,256)
X = in_img.unsqueeze(0)
# Y = (1,3,512,512)  Transposed convoluted stride=2  So the size is enlarged  2  times 
Y = conv_trans(X)
# out_img = [512,512,3]
out_img = Y[0].permute(1, 2, 0).detach()
d2l.set_figsize()
# in_img.permute(1,2,0) = [256,256,3]
print('input image shape:', in_img.permute(1, 2, 0).shape)
# d2l.plt.imshow(in_img.permute(1, 2, 0))
print('output_image_shape:', out_img.shape)
d2l.plt.imshow(out_img)
print('output_image_shape_after:',out_img.shape)
plt.show()
  • result
path=D:\zc\img\banana.jpg
input image shape: torch.Size([256, 256, 3])
output_image_shape: torch.Size([512, 512, 3])
output_image_shape_after: torch.Size([512, 512, 3])

Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/151948.html Link to the original text :https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241545218028.html