当前位置：网站首页>Transpose convolution learning notes

Transpose convolution learning notes

2022-06-24 16:06:00 【Full stack programmer webmaster】

Hello everyone , I meet you again , I'm your friend, Quan Jun .

List of articles

1. Transpose convolution definition

In the prediction process of semantic segmentation , We need to detect each pixel , Then there is a problem , First, we compress the input image by two-dimensional convolutional neural network , Finally, we get a prediction , But if we need to recognize each pixel , It is necessary to deduce the category in each pixel by prediction . for instance , When we recognize cats and dogs , We don't just have to identify where the cat is , Also identify each pixel about the cat , So we need to use transpose convolution . Transpose convolution can make the image larger and larger , Make the generated image have the same size as the original image , Then we can easily perform semantic segmentation .

2. Custom transpose convolution

The concrete is , With convolution kernel K=torch.tensor([[0,1],[2,3]]), Keep following every element in the input x i , j x_{i,j} xi,j Multiply , Finally, all the elements are added to get the output convolution , The details are shown in the figure above .

Code

#  Import database 
# 1. Import database 
import torch
from torch import nn

# 2.  Define the input matrix  x
x = torch.Tensor([[0, 1], [2, 3]])
k = torch.Tensor([[0, 1], [2, 3]])


# 3.  Define transpose convolution function 
def tran_conv(x, k):
	h, w = k.shape
	y = torch.zeros((x.shape[0] + h - 1, x.shape[1] + w - 1))
	for i in range(x.shape[0]):
		for j in range(x.shape[1]):
			y[i:i + h, j:j + w] += x[i, j] * k
	return y

# 4. Define the input tensor  X , Transposed convolution kernel  K
X = torch.Tensor([[0, 1], [2, 3]])
K = torch.Tensor([[0, 1], [2, 3]])

# 5. Output  Y
Y = tran_conv(X, K)
print(f'Y={ 
     Y}')

# 6.  take  X,K  It becomes a four-dimensional tensor , Convenient convolution calculation 
X_conv = X.reshape(1, 1, 2, 2)
K_conv = K.reshape(1, 1, 2, 2)

# 7.  Define two-dimensional transpose convolution operation , Input channel 1, Output channel 1, Convolution kernel  K  by  2 X 2 , Unbiased 
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)

# 8.  take   The value of the transposed convolution kernel is assigned to  K_conv
tconv.weight.data = K_conv

# 9.  Input tensor （X_conv） ->  Transposition convolution （tconv+K_conv) ->  Output tensor 
Y_conv = tconv(X_conv)

# 10.  To check whether the calculation result is consistent with our customized value , Batch to 1 Remove the dimension of 
Y_conv_squeeze = Y_conv.squeeze()
print(f'Y_conv={ 
     Y_conv}')
print(f'Y_conv_squeeze={ 
     Y_conv_squeeze}')

# 11. Judge whether the value of the user-defined transpose convolution function is consistent with the value of the official calling function 
print(f'Y == Y_conv_squeeze:{ 
     Y == Y_conv_squeeze}')

result

Y=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])
Y_conv=tensor([[[[ 0.,  0.,  1.],
          [ 0.,  4.,  6.],
          [ 4., 12.,  9.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_squeeze=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]], grad_fn=<SqueezeBackward0>)
Y == Y_conv_squeeze:tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

3. Transposition convolution

padding： Acting in the output tensor , Subtract... From rows and columns padding Row and column

stride： In the intermediate matrix

Code

# 1. Import database 
import torch
from torch import nn

# 2.  Define the input matrix  x
x = torch.Tensor([[0, 1], [2, 3]])
k = torch.Tensor([[0, 1], [2, 3]])


# 3.  Define transpose convolution function 
def tran_conv(x, k):
	h, w = k.shape
	y = torch.zeros((x.shape[0] + h - 1, x.shape[1] + w - 1))
	for i in range(x.shape[0]):
		for j in range(x.shape[1]):
			y[i:i + h, j:j + w] += x[i, j] * k
	return y


# 4. Define the input tensor  X , Transposed convolution kernel  K
X = torch.Tensor([[0, 1], [2, 3]])
K = torch.Tensor([[0, 1], [2, 3]])

# 5. Output  Y
Y = tran_conv(X, K)
print(f'Y={ 
     Y}')

# 6.  take  X,K  It becomes a four-dimensional tensor , Convenient convolution calculation 
X_conv = X.reshape(1, 1, 2, 2)
K_conv = K.reshape(1, 1, 2, 2)

# 7.  Define two-dimensional transpose convolution operation , Input channel 1, Output channel 1, Convolution kernel  K  by  2 X 2 , Unbiased 
tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)

# 8.  take   The value of the transposed convolution kernel is assigned to  K_conv
tconv.weight.data = K_conv

# 9.  Input tensor （X_conv） ->  Transposition convolution （tconv+K_conv) ->  Output tensor 
Y_conv = tconv(X_conv)

# 10.  To check whether the calculation result is consistent with our customized value , Batch to 1 Remove the dimension of 
Y_conv_squeeze = Y_conv.squeeze()
print(f'Y_conv={ 
     Y_conv}')
print(f'Y_conv_squeeze={ 
     Y_conv_squeeze}')

# 11. Judge whether the value of the user-defined transpose convolution function is consistent with the value of the official calling function 
print(f'Y == Y_conv_squeeze:{ 
     Y == Y_conv_squeeze}')

# 12.padding = 1 , In transpose convolution, the output is subtracted padding=1 Rows and columns of 
tconv_padding_1 = nn.ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)
tconv_padding_1.weight.data = K_conv
Y_conv_padding_1 = tconv_padding_1(X_conv)
print(f'Y_conv_padding_1={ 
     Y_conv_padding_1}')

# 13.stride = 2 ,  Transpose convolution is a step expansion in the middle , Convolution kernel K In the input X Sliding in two steps ,
#  Input is  [2,2], Transposed convolution kernel [2,2],stride=2  The output  Y  The size is  4 = (2-1)*2+2-1+1
tconv_stride_2 = nn.ConvTranspose2d(1, 1, kernel_size=2, stride=2, bias=False)
tconv_stride_2.weight.data = K_conv
Y_conv_stride_2 = tconv_stride_2(X_conv)
print(f'Y_conv_stride_2={ 
     Y_conv_stride_2}')

result

Y=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]])
Y_conv=tensor([[[[ 0.,  0.,  1.],
          [ 0.,  4.,  6.],
          [ 4., 12.,  9.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_squeeze=tensor([[ 0.,  0.,  1.],
        [ 0.,  4.,  6.],
        [ 4., 12.,  9.]], grad_fn=<SqueezeBackward0>)
Y == Y_conv_squeeze:tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])
Y_conv_padding_1=tensor([[[[4.]]]], grad_fn=<SlowConvTranspose2DBackward>)
Y_conv_stride_2=tensor([[[[0., 0., 0., 1.],
          [0., 0., 2., 3.],
          [0., 2., 0., 3.],
          [4., 6., 6., 9.]]]], grad_fn=<SlowConvTranspose2DBackward>)

4. Think about transposed convolution from the perspective of convolution [ a key ]

4.1 explain

Transpose convolution is a kind of convolution

It rearranges the inputs and cores
The same convolution is generally different from the down sampling , It is usually used as up sampling
If convolution will be input from （h,w） become （h’,w’）, Under the same super parameter, it will (h’,w’) become (h,w)

4.2 Fill in with 0, The stride is 1

4.3 Fill in with p, The stride is 1

4.4 Fill in with p, The stride is s

5. Initialization of transpose convolution

Transpose convolution is the same as ordinary convolution , We all need to initialize the convolution kernel , Bilinear interpolation is commonly used to initialize convolution kernel. The specific code is as follows ;

Code

# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: os_test
# @Create time: 2022/1/4 8:38


import torch


def bilinear_kernel(in_channels, out_channels, kernel_size):
	""" :param in_channels:  Enter the number of channels  :param out_channels:  Number of output channels  :param kernel_size:  Convolution kernel size  :return: """
	# factor = 2
	factor = (kernel_size + 1) // 2
	# center = 1.5
	if kernel_size % 2 == 1:
		center = factor - 1
	else:
		center = factor - 0.5
	#  Create a Yuanzu  og[0] = tensor[4,1];og[1]=tensor[1,4]
	og = (torch.arange(kernel_size).reshape(-1, 1),
		  torch.arange(kernel_size).reshape(1, -1))
	#  Perform interpolation calculation , Generate  4 x 4  Matrix 
	filt = (1 - torch.abs(og[0] - center) / factor) * (1 - torch.abs(og[1] - center) / factor)
	#  Generate an all for 0 Matrix  (in_chanels,out_channels,kernel_size,kernel_size)
	#  Will be initialized  filt  The value is put into  weight  Inside 
	weight = torch.zeros((in_channels, out_channels,
						  kernel_size, kernel_size))
	# weight The shape of the , take filt The overall value is assigned diagonally 
	# [[filt],[0],[0]]
	# [0],[filt],[0]
	# [0],[0],[filt]]
	weight[range(in_channels), range(out_channels), :, :] = filt
	return weight


# y :[3,3,4,4]
y = bilinear_kernel(3, 3, 4)

print(f'y={ 
     y}')
print(f'y_shape={ 
     y.shape}')
print(f'y0={ 
     y[0]}')
print(f'y1={ 
     y[1]}')
print(f'y2={ 
     y[2]}')

result

y=tensor([[[[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]]],


        [[[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]]],


        [[[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 0.0000]],

         [[0.0625, 0.1875, 0.1875, 0.0625],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.1875, 0.5625, 0.5625, 0.1875],
          [0.0625, 0.1875, 0.1875, 0.0625]]]])
y_shape=torch.Size([3, 3, 4, 4])
y0=tensor([[[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
y1=tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]]])
y2=tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0625, 0.1875, 0.1875, 0.0625],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.1875, 0.5625, 0.5625, 0.1875],
         [0.0625, 0.1875, 0.1875, 0.0625]]])

6. Transpose convolution image application

We can double the height and width of an image by transpose convolution .

Code

# -*- coding: utf-8 -*-
# @Project: zc
# @Author: zc
# @File name: os_test
# @Create time: 2022/1/4 8:38
import os
import matplotlib.pyplot as plt
import torch
import torchvision.transforms
from torch import nn
from d2l import torch as d2l


def bilinear_kernel(in_channels, out_channels, kernel_size):
	""" :param in_channels:  Enter the number of channels  :param out_channels:  Number of output channels  :param kernel_size:  Convolution kernel size  :return: """
	# factor = 2
	factor = (kernel_size + 1) // 2
	# center = 1.5
	if kernel_size % 2 == 1:
		center = factor - 1
	else:
		center = factor - 0.5
	#  Create a Yuanzu  og[0] = tensor[4,1];og[1]=tensor[1,4]
	og = (torch.arange(kernel_size).reshape(-1, 1),
		  torch.arange(kernel_size).reshape(1, -1))
	#  Perform interpolation calculation , Generate  4 x 4  Matrix 
	filt = (1 - torch.abs(og[0] - center) / factor) * (1 - torch.abs(og[1] - center) / factor)
	#  Generate an all for 0 Matrix  (in_chanels,out_channels,kernel_size,kernel_size)
	#  Will be initialized  filt  The value is put into  weight  Inside 
	weight = torch.zeros((in_channels, out_channels,
						  kernel_size, kernel_size))
	# weight The shape of the , take filt The overall value is assigned diagonally 
	# [[filt],[0],[0]]
	# [0],[filt],[0]
	# [0],[0],[filt]]
	weight[range(in_channels), range(out_channels), :, :] = filt
	return weight


conv_trans = nn.ConvTranspose2d(3, 3, kernel_size=4, padding=1, stride=2, bias=False)
conv_trans.weight.data.copy_(bilinear_kernel(3, 3, 4))
path = os.path.join(os.getcwd(), 'img', 'banana.jpg')# path='D:\\zc\\img\\banana.jpg'
print(f'path={ 
     path}')
# in_img = (3,256,256)
in_img = torchvision.transforms.ToTensor()(d2l.Image.open(path))
# X = (1,3,256,256)
X = in_img.unsqueeze(0)
# Y = (1,3,512,512)  Transposed convoluted stride=2  So the size is enlarged  2  times 
Y = conv_trans(X)
# out_img = [512,512,3]
out_img = Y[0].permute(1, 2, 0).detach()
d2l.set_figsize()
# in_img.permute(1,2,0) = [256,256,3]
print('input image shape:', in_img.permute(1, 2, 0).shape)
# d2l.plt.imshow(in_img.permute(1, 2, 0))
print('output_image_shape:', out_img.shape)
d2l.plt.imshow(out_img)
print('output_image_shape_after:',out_img.shape)
plt.show()

result

path=D:\zc\img\banana.jpg
input image shape: torch.Size([256, 256, 3])
output_image_shape: torch.Size([512, 512, 3])
output_image_shape_after: torch.Size([512, 512, 3])

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/151948.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/175/202206241545218028.html