当前位置：网站首页>In depth learning series 47:stylegan summary

In depth learning series 47:stylegan summary

2022-06-23 07:12:00 【IE06】

1. styleGAN 1

gan One of the problems is that it is difficult to generate large-scale images , until 2018 year ,NVIDIA For the first time ProGAN Solved this challenge . It starts with very low resolution images （ Such as 4×4） Training generator and discriminator , Then add one higher resolution layer at a time . The initial input vector shape by [512,4,4], The final output is [3,1024,1024], Is the total 18 layer ：
18 = 1（ Initial entry conv layer ）+8 * 2（ Each block contains two convolution layers , take vector from [4,4] Change to [1024,1024]）+1（to_rgb layer , Change channel to 3）
Insert picture description here
styleGAN yes proGAN Upgraded version ：
1） Use 8 individual FC Layer encodes the input picture as an intermediate vector , similar PCA Transformation , Combine different features （ hair 、 eyes 、 Nose, etc ） Decoupling , Otherwise “ Feature entanglement ” problem , Too many features in the training data will affect other features .
2） Hidden variables w after A After conversion, it will be connected to each layer with different proportions , Used to scale each channel 、 The offset , be called AdaIN modular . This zoom and offset stores style information , Different values produce different styles .
because Synthesis network The network layer has 18 layer , That's why we say through w Generated 18 Two control vectors , Used to control different visual features .
Insert picture description here

3） Noise is not added to the original vector , Instead, add Synthesis In every layer of , Modify the scale and offset values ：
Insert picture description here
4） Two are used during the training w, Write it down as A and B, It is used to train some network levels ：
The resolution of the (4x4 - 8x8) The network part of the uses B Of style, The rest use A Of style, You can see that the identity of the image changes with souce B, But details such as skin color follow source A;
The resolution of the (16x16 - 32x32) The network part of the uses B Of style, At this time, the generated image no longer has B Identity characteristics of , hairstyle 、 The posture and so on have changed , But the color of skin still changes A;
The resolution of the (64x64 - 1024x1024) The network part of the uses B Of style, At this time, the identity characteristics change with A, Skin color with B.

2. styleGAN 2

stylegan2 Mainly to solve stylegan It is easy to appear on the generated image “ Water drop ” problem . The cause of the water drop is Adain operation ,Adain For each feature map Normalize , So it is possible to destroy feature Information between , The above phenomenon occurs . And removed Adain after , The problem is solved .
The main modification points include ：
1） Remove the initial data processing
2） Cancel multiplying by the mean after standardizing the feature
3） take noise The module is external style Module add
Insert picture description here
4） Join in weight demodulation, Solve the problem that the features are not proportional after multiplying by the mean value ：

5）Lazy regularization： Every time 16 individual minibatch Just optimize the regular term once , This reduces the amount of computation , At the same time, it has no effect on the effect .

3. styleGAN 3

stay GAN During the synthesis of , Some features depend on absolute pixel coordinates , This can lead to ： The details seem to stick to the image coordinates , Not the surface of the object to be generated . The emergence of this problem is actually GAN A common problem of the model ： The generation process is not a natural hierarchical generation . Rough features （GAN The output characteristics of the shallow network ） It mainly controls the fine features （GAN The output characteristics of the deep network ） The presence or absence , There is no precise control over where they appear .
The current generator network architecture is convolution + nonlinear + Up sampling and other structures , And such an architecture cannot do well Equivariance（ Equivariant ）
and stylegan3 Fundamentally solved stylegan2 The problem of adhesion between image coordinates and features , The real image translation is realized 、 Rotation invariance , The quality of image synthesis is greatly improved .
(1) utilize Fourier features （ Fourier characteristic ） Instead of stylegan2 Constant input to the generator
(2) Deleted noise Input （ The position information of the feature should come from the previous rough feature ）
(3) Reduced network depth （14 layer , It used to be 18 layer ）, Ban mixing regularization and path length regularization, And simple normalization is used before each convolution （ This is a bit of a direct reversal stylegan2 Some thoughts of ）
(4) The ideal low-pass filter is used to replace bilinear sampling .
(5) In order to get the rotation invariant network , Make two improvements ： The convolution kernel size of all layers is changed from 3 Replace with 1, By way of feature map Double the number of , To compensate for reduced feature capacity

4. styleGAN function

Insert picture description here
GAN There is a smooth and continuous hidden space , Not like VAE（Variational Auto Encoder） There is a gap . therefore , When you take two points in the lurking space f1 and f2, There will be two different faces , You can create a transition or interpolation between two faces by taking a linear path between two points .
Insert picture description here