当前位置：网站首页>2018/gan:self attention generating adversarial networks

2018/gan:self attention generating adversarial networks

2022-06-23 23:49:00 【HheeFish】

Self-Attention Generative Adversarial Networks Self - attention generates confrontation networks

0. Abstract
1. summary
2. Related work
- 2.1.GAN
- 2.2. Attention model
3. Self attention generates countermeasure networks Self-Attention Generative Adversarial Networks
4. Stable GANs Training skills
- 4.1. Spectral normalization of the generator and discriminator
- 4.2. Unbalanced learning rate of generator and discriminator updates
5. experiment
6. Conclusion
reference

0. Abstract

In this paper , We propose that self - attention generates confrontation networks (Self-Attention Generative Adversarial Network, SAGAN), The network allows attention driven long-term dependency modeling for image generation tasks . The traditional convolution GANs In low resolution feature map, only local points in space are used as functions to generate high-resolution details . stay SAGAN in , You can use clues from all feature locations to generate details . Besides , The discriminator can also check whether the high detail features of the remote part of the image are consistent . Besides , Recent research shows that , The condition of the generator affects GAN performance . Use this insight , We apply spectral normalization to GAN generator , And found that this improved the training dynamics . Proposed SAGAN Better performance than before 1, Will best Inception score(IS) from 36.8 Up to 52.52, And will be challenging ImageNet The starting distance on the dataset is from 27.62 Down to 18.65. Notice the visual display of the layer , The generator makes use of the neighborhood corresponding to the shape of the object , Instead of fixed shape local areas .

1. summary

Insert picture description here

chart 1 Shown . Proposed SAGAN A consistent target is generated by using the complementary features of the distant part of the image instead of the fixed shape local area / Scene to generate images . In every line , The first image shows five representative query locations , And use color coded dots . The other five pictures are the attention maps of these query locations , The areas of most concern are summarized with the corresponding color coded arrows .

Image synthesis is an important problem in computer vision . With the generation of counter networks (GANs) Appearance , Remarkable progress has been made in this direction (Goodfellow wait forsomeone ,2014), Although there are still many open problems (Odena, 2019). Based on depth convolution network GANs (Radford et al., 2016;Karras wait forsomeone ,2018;Zhang wait forsomeone ) Especially successful . However , By carefully examining the samples generated by these models , We can observe convolution GANs (Odena wait forsomeone ,2017;Miyato wait forsomeone ,2018;Miyato & Koyama, 2018) When training on multi class datasets ( for example ,ImageNet (Russakovsky wait forsomeone ,2015)), Modeling some image classes is much more difficult than other image classes . for example , Although the most advanced ImageNet GAN Model (Miyato & Koyama, 2018) Good at compositing image classes with few structural constraints ( for example , ocean 、 Sky and landscape , They are distinguished more by texture than by geometry ), However, it cannot capture geometric or structural patterns that consistently appear in certain classes ( for example , Dogs are usually painted with realistic fur textures , But there is no clearly defined individual foot ). One possible explanation is , Previous models rely heavily on convolution to model the dependencies between different image regions . Because the convolution operator has a local receiving domain , Therefore, only after several layers of convolution can we deal with long-distance dependencies . For various reasons , This may prevent understanding long-term dependencies : A small model may not represent them , Optimization algorithms may find it difficult to find parameter values that carefully coordinate multiple layers to capture these dependencies , And these parameterizations can be statistically fragile , Easy to fail when applied to previously invisible input . Increasing the size of convolution kernel can increase the representation capacity of the network , But at the same time, the computational and statistical efficiency obtained by using the local convolution structure will be lost . Pay attention to yourself (Cheng wait forsomeone ,2016;Parikh wait forsomeone ,2016; On the other hand ,V aswani wait forsomeone ,2017) Show a better balance between the ability to simulate remote dependencies and computational and statistical efficiency . The self attention module calculates the response of a certain location as the weighted sum of all location features , And the weight ( Or notice the vector ) The calculation of requires only a small calculation cost .
In this study , We propose that self - attention generates confrontation networks (SAGANs), This network introduces the self - attention mechanism into the convolution generation countermeasure network . The self - attention module is the complement of convolution , And help model the long-term , Multi level dependent image region . With the ability to pay attention to oneself , The generator can draw an image , The details of each position are carefully coordinated with the details of the distant part of the image . Besides , The discriminator can also impose complex geometric constraints on the global image structure more accurately
Besides self - attention , We also combine recent work on network conditioning with GAN Performance insights .Odena The work of others (2018 year ) indicate , Well qualified generators tend to perform better . We suggest using spectral normalization technique to GAN The generator is well adjusted , This technology was previously only used in discriminators (Miyato wait forsomeone ,2018).
We are ImageNet A lot of experiments have been done on data sets , To verify the effectiveness of the proposed self - attention mechanism and stabilization techniques .SAGAN By way of IS The best score from 36.8 Up to 52.52, And will Fréche IS Distance from 27.62 Down to 18.65, It is significantly better than the previous work in image synthesis . Notice the visual display of the layer , The generator makes use of the neighborhood corresponding to the shape of the object , Instead of fixed shape local areas .

2. Related work

2.1.GAN

GANs It has achieved great success in various image generation tasks , Including image to image conversion (Isola et al., 2017;Zhu wait forsomeone ,2017; Tagman et al ,2017;Liu and Tuzel, 2016; Xue et al ,2018;Park wait forsomeone ,2019), Image super-resolution (Ledig wait forsomeone ,2017;Snderby wait forsomeone ,2017) And text to image compositing (Reed wait forsomeone ,2016b;a;Zhang wait forsomeone ,2017;Hong et al., 2018). Despite this success , as everyone knows ,GANs Your training is unstable , Very sensitive to the choice of super parameters . There are several efforts to stabilize by designing new network architectures GAN Train dynamics and improve sample diversity (Radford et al., 2016;Zhang wait forsomeone ,2017;Karras wait forsomeone ,2018;2019), Modify learning objectives and dynamics (Arjovsky wait forsomeone ,2017;Salimans wait forsomeone ,2018; Metz et al ,2017;Che wait forsomeone ,2017; Zhao et al ,2017;Jolicoeur-Martineau, 2019), Add a regularization method (Gulrajani wait forsomeone ,2017;Miyato et al., 2018) And introducing heuristic techniques (Salimans et al., 2016;Odena wait forsomeone ,2017;Azadi wait forsomeone ,2018). lately ,Miyato wait forsomeone (Miyato et al., 2018) The spectral norm of the weight matrix in the discriminator is limited , To constrain the... Of the discriminator function Lipschitz constant . Combined with projection based discriminator (Miyato & Koyama, 2018), The spectral normalization model is greatly improved ImageNet Class conditional image generation on .

2.2. Attention model

lately , Attention mechanisms have become part of a model that must capture global dependencies (Bahdanau wait forsomeone ,2014;Xu wait forsomeone ,2015; Yang et al ,2016;Gregor wait forsomeone ,2015;Chen wait forsomeone ,2018). Especially self attention (Cheng wait forsomeone ,2016;Parikh et al., 2016), Also known as intra-attention, Calculate the response of a position in the sequence by focusing on all positions in the same sequence .V aswani wait forsomeone (V aswani et al., 2017) prove , The machinetranslation model can obtain the most advanced results only by using the self - attention model .Parmar wait forsomeone (Parmar et al., 2018) Put forward a kind of Image Transformer Model , Add self - attention to the autoregressive model , Used to generate images .Wang wait forsomeone (Wang et al., 2018) Formalize self - attention as a nonlocal operation , To model spatiotemporal correlation in video sequences . Despite these advances , But in GANs Self - attention has not been discussed in .(AttnGAN (Xu et al., 2018) Note the use of word embedding in the input sequence , But do not use self - attention to the internal model state ).SAGAN Learn to effectively find the global representation in the internal representation of the image 、 Long term dependency .

3. Self attention generates countermeasure networks Self-Attention Generative Adversarial Networks

Insert picture description here

chart 2.SAGAN Self - awareness module .⊗ Representation matrix multiplication . Execute... On each line softmax operation

Most are based on GAN Model of (Radford wait forsomeone ,2016;Salimans wait forsomeone ,2016;Karras wait forsomeone ,2018) The image generation of uses convolution layer to build . The information of convolution processing is in a local neighborhood , Therefore, using convolution layer alone is computationally inefficient for modeling remote dependent images . In this section , We used (Wang et al., 2018) Nonlocal model of , Introduce self - attention into GAN frame , Both the generator and the discriminator can effectively model the relationship between widely separated spatial regions . Because of its self - attention module ( See chart 2), We call the proposed method "self - attention - generating antagonism network" (Self-Attention Generative Adversarial Networks, SAGAN).
First, the image features of the previously hidden layer x∈R^C×N Into two feature spaces f, g To calculate attention , among f(x) = W_f^x, g(x) = W_g^x,
Insert picture description here
β_j,i Indicates that the model is in the synthesis stage j For the first area i The degree of attention of each location . among C Is the number of channels ,N Is the number of feature positions of the feature of the previous hidden layer . Note that the output of the layer is o = (o₁, o₂,…、o_j……o_N)∈R^C×N, among ,
Insert picture description here
In the above formula ,W_g∈R^C¯×C、W_f∈R^C¯×C、W_h∈R^C¯×C and W_v∈R^C×C Is the learned weight matrix , They are implemented as 1×1 Convolution . Because when we will C¯ The number of channels reduced to C/k( stay ImageNet After a few training sessions ,k = 1,2,4,8) when , We didn't notice any significant performance degradation . To improve memory efficiency , We chose... In all our experiments k = 8( namely C¯ = C/8).
Besides , We further multiply the output of the attention layer by a scale parameter , And add back the input characteristic diagram . therefore , The final output is ,
Insert picture description here
among γ Is a learnable scalar , Initialize to 0. Introduce learnable γ Make the network depend on the clues of local area first —— Because it's easier —— Then gradually learn to give more weight to nonlocal evidence . The reason we do this is very simple : We want to learn simple tasks first , Then gradually increase the complexity of the task . stay SAGAN in , The proposed attention module has been applied to generators and discriminators , They are trained in an alternating manner by a hinged version that minimizes the loss of antagonism (Lim & Ye, 2017;Tran wait forsomeone ,2017;Miyato et al., 2018),
Insert picture description here

4. Stable GANs Training skills

We also studied two techniques to stabilize GANs Training on challenging data sets . First , We use spectral normalization in generators and discriminators (Miyato et al., 2018). secondly , We confirm the two time scale update rule (TTUR) (Heusel et al., 2017) It works , We advocate using it especially to solve the slow learning problem in regularization discriminator .

4.1. Spectral normalization of the generator and discriminator

Miyato wait forsomeone (Miyato et al., 2018) At first, it was proposed to apply spectral normalization to the discriminator network to stabilize GANs Training for . In this way, the Lipschitz constant of the discriminator is limited by limiting the spectral norm of each layer . Compared with other normalization techniques , Spectral normalization does not require additional hyperparametric adjustment ( Set the spectral norm of the ownership value layer to 1 It always performs well in practical application ). Besides , The calculation cost is also relatively small .
We think , Based on recent evidence , The conditioning of the generator is GANs Important causal factors for performance , Generators can also benefit from spectral normalization (Odena wait forsomeone ,2018). Spectral normalization can prevent the parameter amplitude from increasing , Avoid abnormal gradients . We found through experiments that , Spectral normalization of the generator and the discriminator allows fewer discriminator updates per generator update , Thus, the calculation cost of training is significantly reduced . This method also shows more stable training behavior .

4.2. Unbalanced learning rate of generator and discriminator updates

In previous work , Regularization of discriminator (Miyato et al., 2018;Gulrajani wait forsomeone ,2017) Often slow down GANs Learning process . In practice , The method of using regularization discriminator is usually used in the training process , Each generator update step requires multiple ( for example ,5) Discriminator update step . Independently ,Heusel wait forsomeone (Heusel et al., 2017) Advocate using separate learning rates for generators and discriminators (TTUR). We recommend using TTUR To compensate for the slow learning of the regularization discriminator , Make it possible to use fewer discriminator steps per generator step . Using this method , We can produce better results in the same time .

5. experiment

Insert picture description here

chart 4.128×128 Example randomly generated baseline model and our model “SN on G/D” and “SN on G/D+TTUR”

In order to evaluate the proposed method , We are LSVRC2012 (ImageNet) A lot of experiments have been done on data sets (Russakovsky et al., 2015). First , stay 5.1 In the festival , We propose to evaluate stability GANs Experiments on the effectiveness of the two techniques of training . Next , In the 5.2 The proposed self attention mechanism is studied in section . Last , our SAGAN The method is compared with the most advanced method (Odena wait forsomeone ,2017;Miyato & Koyama, 2018) stay 5.3 Section . The model is in each 4 individual gpu Use synchronization on SGD( as everyone knows , asynchronous SGD There are difficulties , See example (Odena, 2016)) Training appointment 2 Zhou .
The evaluation index
We choose Inception score (IS) (Salimans et al., 2016) and Fr´echet Inception distance (FID) (Heusel et al., 2017) Make a quantitative assessment . Although there are alternatives (Zhou wait forsomeone ,2019;Khrulkov and Oseledets, 2018;Olsson et al., 2018), They are not widely used .Inception score (Salimans et al., 2016) Calculate the relation between conditional class distribution and marginal class distribution KL The divergence .Inception The higher the score , The better the image quality . We include Inception score , Because it is widely used , So we can compare our results with previous work . However , It's important to understand Inception There are serious limits on scores — It is mainly to ensure that the samples generated by the model can be reliably identified as belonging to a specific class , And the model generates samples from many classes , It is not necessary to evaluate the authenticity of the details or the diversity within the class .FID Is a more principled 、 A more comprehensive measure , When evaluating the authenticity and variability of the generated samples , Has proven to be more consistent with human assessment (Heusel et al., 2017).FID Calculated at Inception-v3 Between the generated image and the real image in the feature space of the network Wasserstein-2 distance
Except for the entire data distribution ( namely ., ImageNet All in 1000 Class image ), We also calculate the relationship between the generated image in each class and the dataset image FID( be called intra FID (Miyato & Koyama, 2018)). Lower FID And internal FID Value means that the composite data distribution is closer to the real data distribution . In all our experiments , Each model is randomly generated 50k Samples , Calculation Inception fraction 、FID and intra FID.
Network structure and implementation details . All we train SAGAN Models are designed to generate 128×128 Images . By default , The layers in both the generator and the discriminator use spectral normalization (Miyato et al., 2018). Be similar to (Miyato & Koyama, 2018), SAGAN Use conditional batch normalization in the generator , Use projection in discriminator . For all models , We use β₁ = 0 and β₂ = 0.9 Of Adam Optimizer (Kingma & Ba, 2015) Training . By default , The learning rate of the discriminator is 0.0004, The learning rate of the generator is 0.0001.

5.1. Evaluate proposed stabilization techniques

Insert picture description here

chart 3. The baseline model of the training curve and our model with the proposed stabilization technique ,“SN Upper G/D” And two time scale learning rate (TTUR). For all models G and D Conduct 1:1 Balanced update training

In this section , We conducted experiments to evaluate the effectiveness of the proposed stabilization technique , namely , Normalize the spectrum (SN) Apply to generators and take advantage of unbalanced learning rates (TTUR). In the figure 3 in , Our model “SN on G/D” and “SN on G/D+TTUR” It is compared with the baseline model based on the most advanced image generation method (Miyato et al., 2018). In this baseline model , Use only in the discriminator SN. When we talk about the discriminator (D) And generators (G) Conduct 1:1 Balanced and updated training , Training becomes very unstable , Pictured 3 The leftmost subgraph shows . It shows pattern collapse early in training . for example , chart 4 The subgraph in the upper left corner shows the baseline model in the 10k Some randomly generated images in the second iteration . Although in the original paper (Miyato et al., 2018) in , Yes D and G use 5:1 The unbalanced update of the has greatly alleviated this unstable training behavior , But in order to improve the convergence speed of the model , use 1:1 It is desirable to update the ability of stable training . therefore , Using our proposed technique means that at the same wall clock time , This model can produce better results . therefore , There is no need to find a suitable update ratio for the generator and discriminator . Pictured 3 As shown in the middle sub figure , Add... In both generator and discriminator SN, Even if it's 1:1 Balanced update training , Our model “SN on G/D” It will also be very stable . But in the process of training , The quality of samples is not monotonically improved . for example , adopt FID and IS The measured image quality is in the 260k At the second iteration, it starts to descend . In the figure 4 You can find sample images randomly generated by the model in different iteration processes . When we also use the unbalanced learning rate to train the discriminator and generator , Our model “SN on G/D+TTUR” In the whole training process, the image quality is monotonically improved . Pictured 3 Sum graph 4 Shown , In onemillion training iterations , We did not observe sample quality or FID or Inception Any significant drop in scores . therefore , Both quantitative and qualitative results prove that the proposed stabilization technique is effective for GANs Effectiveness of training . They also show that , The effects of the two technologies are superimposed at least to some extent . In the rest of the experiments , All models have normalized the spectrum of the generator and discriminator , The unbalanced learning rate is used to test the generator and discriminator 1:1 Updated workouts .

5.2. Self-Attention Mechanism

Insert picture description here

surface 1.GANs The comparison between self - attention and the remaining blocks . These blocks are added to different layers of the network . All the models have been trained with onemillion iterations , And reported the best Inception fraction (IS) and Fr´echet Inception distance (FID).F eatk It means in k×k Add self focus to the function diagram

Insert picture description here

chart 5. Visualization of attention map . These images were generated by Sagan . We will use the attention map visualization of the last generator layer of attention , Because this layer is closest to the output pixel , And it is easiest to project into the pixel space and explain . In each cell , The first image shows three representative query locations , These positions are represented by color coded dots . The other three images are attention maps of these query locations , The areas of most concern are summarized with the corresponding color coded arrows . We observed that this network learned to allocate attention according to the similarity of color and texture , Not just spatial adjacency ( See the upper left cell ). We also found that , Although some query points are very close in space , But their attention maps can be very different , As shown in the lower left cell . As shown in the upper right cell ,SAGAN Can draw a dog with two distinct legs . Blue query points indicate that attention is helpful to obtain the correct joint area structure . For more discussion on the properties of learned attention graph, please refer to this article .

In order to explore the effect of the proposed self attention mechanism , We build multiple by adding self - attention mechanisms at different stages of the generator and discriminator SAGAN Model . As shown in the table 1 Shown , In the middle and high level feature map ( Such as f eat32 and f eat64) Having a mechanism of self - attention SAGAN Model ratio in low-level characteristic diagram ( Such as f eat8 and f eat16) The model with self - attention mechanism has better performance . for example ,“SAGAN, f eat8” Model FID from 22.98 Up to 18.28,“SAGAN, f eat32”. The reason lies in , Self attention receives more evidence , And there is more freedom to choose the conditions with larger feature mapping ( namely , For larger feature maps , It is a complement to convolution ), But in mapping smaller features ( Such as 8×8) When modeling dependencies , It acts like a local convolution . It turns out that , The attention mechanism enables both the generator and the discriminator to directly model remote dependencies in the feature graph . Besides , our SAGAN Model and inattentive baseline model ( surface 1 The second column of ) The comparison further shows the effectiveness of the proposed self attention mechanism .
Compared with the residual block with the same number of parameters , The self - attention block also achieves better results . for example , When we use 8×8 feature maps Residual block replacement in self-attention block when , Training is unstable , This results in a significant performance degradation ( for example ,FID from 22.98 Add to 42.13). Even if the training goes well , Replacing the self - attention block with the residual block will still result in FID and Inception A worse score .( for example , Characteristics of figure 32 × 32 Medium FID 18.28 vs 27.33). This comparison shows that , Use SAGAN The performance improvement is not only due to the increase of model depth and capacity .
To better understand what was learned during the generation process , We are SAGAN The attention weight of the generator to different images is visualized in . chart 5 Sum graph 1 Shows some noteworthy example images . Description of some properties of learning attention graph , See Fig 5 Explanation .

5.3. Compared with the most advanced

Insert picture description here

surface 2. Proposed SAGAN With the most advanced GAN Comparison of models (Odena wait forsomeone ,2017;Miyato & Koyama, 2018) be used for ImageNet Class conditional image generation on . Calculate according to the weight published by the government sngan Of FID.

Insert picture description here

chart 6. from SAGAN Generated for different classes 128x128 Sample image . Each line shows an example of a class . In the leftmost column , List our SAGAN Inside FID( Left ) And state-of-the-art methods (Miyato & Koyama, 2018))( Right ).

We will also our SAGAN With the most advanced GAN The models are compared (Odena wait forsomeone ,2017;Miyato & Koyama, 2018) be used for ImageNet Class conditional image generation on . As shown in the table 2 Shown , What we proposed SAGAN Achieved the best Inception fraction , Inside FID and FID. The proposed SAGAN Significantly improve the best published Inception score, from 36.8 Up to 52.52.SAGAN Low implementation FID(18.65) and intra FID(83.7) Also shows ,SAGAN The remote dependency between image regions is modeled by using the self attention module , It can better approximate the original image distribution .
chart 6 It shows some ImageNet Comparison results and generated images of representative classes . We observed that , In compositing image classes with complex geometric or structural patterns ( Such as goldfish and St. Bernard ) aspect , our SAGAN More advanced than GAN Model (Miyato & Koyama, 2018) Better performance ( namely , Lower interior FID). For classes with fewer structural constraints ( for example , The valley 、 Stone walls and coral fungi , They are distinguished more by texture than by geometry ), our SAGAN Shows fewer advantages than the baseline model (Miyato & Koyama, 2018). Again , as a result of SAGAN Self - attention in is a complement to convolution , Used to capture long-term events that occur consistently in geometric or structural patterns 、 Global level dependencies , But when modeling dependencies for simple textures , Its function is similar to local convolution .

6. Conclusion

In this paper , We propose a self attention generation countermeasure network (SAGANs), The network integrates self - attention into GAN In the frame . The self - attention module is effective in modeling remote dependencies . Besides , We show that , Spectral normalization is used to stabilize the generator GAN Training and TTUR Speed up the training of regularization discriminator .SAGAN stay ImageNet The class conditional image generation on has achieved the most advanced performance .

reference

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. arXiv:1701.07875, 2017.
Azadi, S., Olsson, C., Darrell, T., Goodfellow, I., and Odena, A. Discriminator rejection sampling. arXiv preprint arXiv:1810.06758, 2018.
Bahdanau, D., Cho, K., and Bengio, Y . Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.
Brock, A., Donahue, J., and Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
Che, T., Li, Y ., Jacob, A. P ., Bengio, Y ., and Li, W. Mode regularized generative adversarial networks. In ICLR, 2017.
Chen, X., Mishra, N., Rohaninejad, M., and Abbeel, P . Pixelsnail: An improved autoregressive generative model. In ICML, 2018.
Cheng, J., Dong, L., and Lapata, M. Long short-term memory-networks for machine reading. In EMNLP, 2016.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. C., and Bengio, Y . Generative adversarial nets. In NIPS, 2014.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., and Wierstra, D. DRAW: A recurrent neural network for image generation. In ICML, 2015.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V ., and Courville, A. C. Improved training of wasserstein GANs. In NIPS, 2017.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, pp. 6629–6640, 2017.
Hong, S., Y ang, D., Choi, J., and Lee, H. Inferring semantic layout for hierarchical text-to-image synthesis. In CVPR, 2018.
Isola, P ., Zhu, J.-Y ., Zhou, T., and Efros, A. A. Image-toimage translation with conditional adversarial networks. In CVPR, 2017.
Jolicoeur-Martineau, A. The relativistic discriminator: a key element missing from standard GAN. In ICLR, 2019. Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In ICLR, 2018.
Karras, T., Laine, S., and Aila, T. A style-based generator architecture for generative adversarial networks. In CVPR, 2019. Khrulkov, V . and Oseledets, I. Geometry score: A method for comparing generative adversarial networks. arXiv preprint arXiv:1802.02664, 2018.
Kingma, D. P . and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2015.
Ledig, C., Theis, L., Huszar, F., Caballero, J., Aitken, A., Tejani, A., Totz, J., Wang, Z., and Shi, W. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
Lim, J. H. and Y e, J. C. Geometric GAN. arXiv:1705.02894, 2017.
Liu, M. and Tuzel, O. Coupled generative adversarial networks. In NIPS, 2016.
Metz, L., Poole, B., Pfau, D., and Sohl-Dickstein, J. Unrolled generative adversarial networks. In ICLR, 2017.
Miyato, T. and Koyama, M. cGANs with projection discriminator. In ICLR, 2018.
Miyato, T., Kataoka, T., Koyama, M., and Y oshida, Y . Spectral normalization for generative adversarial networks. In ICLR, 2018.
Odena, A. Faster asynchronous sgd. arXiv preprint arXiv:1601.04033, 2016.
Odena, A. Open questions about generative adversarial networks. Distill, 2019. doi: 10.23915/distill.00018. https://distill.pub/2019/gan-open-problems. Odena, A., Olah, C., and Shlens, J. Conditional image synthesis with auxiliary classifier GANs. In ICML, 2017.
Odena, A., Buckman, J., Olsson, C., Brown, T. B., Olah, C., Raffel, C., and Goodfellow, I. Is generator conditioning causally related to GAN performance? In ICML, 2018.
Olsson, C., Bhupatiraju, S., Brown, T., Odena, A., and Goodfellow, I. Skill rating for generative models. arXiv preprint arXiv:1808.04888, 2018.
Parikh, A. P ., T¨ackstr¨om, O., Das, D., and Uszkoreit, J. A decomposable attention model for natural language inference. In EMNLP, 2016.
Park, T., Liu, M., Wang, T., and Zhu, J. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019. Parmar, N., V aswani, A., Uszkoreit, J., ukasz Kaiser, Shazeer, N., and Ku, A. Image transformer. arXiv:1802.05751, 2018.
Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016.
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., and Lee, H. Learning what and where to draw. In NIPS, 2016a.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., and Lee, H. Generative adversarial text-to-image synthesis. In ICML, 2016b. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet large scale visual recognition challenge. IJCV, 2015.
Salimans, T., Goodfellow, I. J., Zaremba, W., Cheung, V ., Radford, A., and Chen, X. Improved techniques for training GANs. In NIPS, 2016.
Salimans, T., Zhang, H., Radford, A., and Metaxas, D. N. Improving GANs using optimal transport. In ICLR, 2018.
Snderby, C. K., Caballero, J., Theis, L., Shi, W., and Huszar, F. Amortised map inference for image super-resolution. In ICLR, 2017.
Taigman, Y ., Polyak, A., and Wolf, L. Unsupervised crossdomain image generation. In ICLR, 2017.
Tran, D., Ranganath, R., and Blei, D. M. Deep and hierarchical implicit models. arXiv:1702.08896, 2017.
V aswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. arXiv:1706.03762, 2017.
Wang, X., Girshick, R., Gupta, A., and He, K. Non-local neural networks. In CVPR, 2018.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., Zemel, R. S., and Bengio, Y . Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
Xu, T., Zhang, P ., Huang, Q., Zhang, H., Gan, Z., Huang, X., and He, X. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, 2018.
Xue, Y ., Xu, T., Zhang, H., Long, L. R., and Huang, X. SegAN: Adversarial network with multi-scale L1 loss for medical image segmentation. Neuroinformatics, pp. 1–10, 2018.
Y ang, Z., He, X., Gao, J., Deng, L., and Smola, A. J. Stacked attention networks for image question answering. In CVPR, 2016.
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. N. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. TPAMI. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, 2017.
Zhao, J., Mathieu, M., and LeCun, Y . Energy-based generative adversarial network. In ICLR, 2017.
Zhou, S., Gordon, M., Krishna, R., Narcomey, A., Morina, D., and Bernstein, M. S. HYPE: human eye perceptual evaluation of generative models. CoRR, abs/1904.01121, 2019. URL http://arxiv.org/ abs/1904.01121. Zhu, J.-Y ., Park, T., Isola, P ., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.