当前位置:网站首页>Learning pyramid context encoder network for high quality image painting paper notes
Learning pyramid context encoder network for high quality image painting paper notes
2022-07-24 05:00:00 【Magic__ Conch】
IEEE Conference Proceedings arXiv: Computer Vision and Pattern Recognition Jan 2019
List of articles
Problems solved and improvement
Existing methods cannot be combined Direct visual information and deep semantic information .
- patch search And others lack the understanding of high-level semantic consistency .
- generative models Of stacked constructions and poolings There is over-smooth, lack of visually-realistic Other questions .
Model structures,
With UNet For the skeleton , In the image-level and feature-level Fill the missing area on .
pyramid-context encoder: Use cross-layer The mechanism of attention transmission and pyramid filling

Each level 𝜓 From this layer feature map - 𝜙 and On a higher level 𝜓 Common process ATN( In style f) obtain .
Attention Transfer Network(ATN)( It's the one above f)
One 、 Reconstruct feature map from high-level semantics ψ L \psi^L ψL Fill in the next layer of feature map ϕ L − 1 \phi^{L-1} ϕL−1, To get the reconstruction feature map of the next layer ψ L − 1 \psi^{L-1} ψL−1.
First extract ψ l ψ^l ψl, And then calculate patch Cosine similarity between .

Then use on similarity Softmax Function to get each patch My attention score (Attention Score).

After obtaining the attention score of high-level semantic features ( Namely the above formula α i , j l α_{i,j}^l αi,jl), The feature map of the next level can be weighted by the attention score context Fill in .

Calculate all patch after , You can get ψ l − 1 ψ^{l−1} ψl−1 ( above i All calculations of can be formulated into convolution calculation for end-to-end training ).
Two 、 elaboration
The multi-scale context information is aggregated by four groups of dilated convolutions with different rates , This design ensures the consistency between the structure of the final reconstruction feature and the environment , Improved the repair effect of the test .
multi-scale decoder
- multi-scale decoder Approved by ATN Reconstruction features and encoder Of latent feature Make input .
- decoder Characteristic graph φ L − 1 、 φ L − 2 φ^{L−1} 、φ^{L−2} φL−1、φL−2 etc. , It is calculated from the following formula .

among , from ATN The generated reconstruction feature is that the missing region encodes lower level information , It is beneficial to use fine-grained details to generate visually realistic results ; Compact extracted by convolution latent When the feature can't find the object in the area outside the missing , Synthesize new objects .
Semantic consistency depends on deep convolution , The texture is consistent ATN Shallow features of reconstruction .
- Pyramid L1 losses

An adversarial training loss
The total loss function consists of :Generator + Discriminator
- Use PatchGAN(Image-to-Image Translation with Conditional Adversarial Networks) As part of this article discriminator, At the same time, spectral normalization is used to stabilize the training .
- In this paper ,pyramid-context encoder and multi-scale decoder constitute Generator.
The definition of the loss function :
Definition generator The final prediction result z:
z = G ( x ⊙ ( 1 − M ) , M ) ⊙ M + x ⊙ ( 1 − M ) z=G(x ⊙(1−M), M)⊙M+x ⊙(1−M) z=G(x⊙(1−M),M)⊙M+x⊙(1−M)discriminator The confrontation loss function of can be expressed as :

generator The confrontation loss function of is :

PEN-NET By minimizing counter losses and pyramid L1 Loss ( At the end of the last section ) To optimize , The overall objective function is :

model analysis
analysis pyramid L1 Loss and ATN The role of these two network components .
Pyramid L1 Loss
Pyramid L1 Loss The loss function is gradually refined at each scale ,pyramid loss It is conducive to decoding compact features layer by layer .
ATN
Cross layer attention transmission mechanism to U-Net Skeleton brings improvement .
The first behavior is pure... Without using any attention mechanism U-Net The Internet , The second line is no deeper guidance Of CA Method , The third layer is ATN Apply to U-Net Architectural results .
边栏推荐
- 力。操处于业务低峰期。进口调用会帮您准备时,每个字
- Threejs+shader drawing commonly used graphics
- Infineon launched the world's first TPM security chip with post quantum encryption technology for firmware update
- Rlib learning - [4] - algorithmconfig detailed introduction [pytoch version]
- Smart pointer, lvalue reference, lvalue reference, lambda expression
- LabVIEW主VI冻结挂起
- Hanoi problem
- How to make the words on the screen larger (setting method to make the text more comfortable on the large screen)
- Add.Eslinctrc.js under SRC for the general format of the project
- Hcde city closed door meeting successfully held in Nanjing station
猜你喜欢

Nautilus 3.19.2 adds momentum to Gnome

472-82 (22, 165, 39, sword finger offer II 078, 48. Rotate image)

MapReduce介绍

Common cross domain problems

Threejs+shader drawing commonly used graphics

How can e-commerce projects solve the over issuance of online coupons (troubleshooting + Solutions) (glory Collection)

HMS core discovery Episode 16 live broadcast preview | play AI's new "sound" state with tiger pier

How to set up an internal wiki for your enterprise?

Chapter 1 regression, classification & clustering

后 SQL 时代降临:EdgeDB 2.0 发布会预告
随机推荐
本,降低线上要度是一样的。发现异常实例cp操
MapReduce concept
A hospital call system based on C language
Icml2022 | rock: causal reasoning principle on common sense causality
Introduction and use of pycharm debugging function
排序——QuickSort
How can I open and view the bin file? Diagram of reading method of bin file backed up by router
微信朋友圈的高性能架构设计
Mysq Database Constraints
What if IPv4 has no internet access? Solutions to IPv4 without internet access rights (detailed explanation of pictures and texts)
The difference between statement and Preparedstatement and how to use placeholders
到3mm;提供安全稳定的产品作的执行据发出方iid
Little black gnawing leetcode:589. Preorder traversal of n-ary tree
[machine learning] - [traditional classification problem] - naive Bayesian classification + logistic regression classification
Chapter V communication training
Middle aged crisis, workplace dad who dare not leave, how to face life's hesitation
Event extraction and documentation (2019)
472-82 (22, 165, 39, sword finger offer II 078, 48. Rotate image)
Several common sorts
What if the computer time is often inaccurate? Set up tutorials to automatically update and proofread computer time