TransGAN: Two Transformers Can Make One Strong GAN

  • 2021-02-16 05:51:12
  • Yifan Jiang, Shiyu Chang, Zhangyang Wang
  • 25

Abstract

The recent explosive interest on transformers has suggested their potentialto become powerful "universal" models for computer vision tasks, such asclassification, detection, and segmentation. However, how further transformerscan go - are they ready to take some more notoriously difficult vision tasks,e.g., generative adversarial networks (GANs)? Driven by that curiosity, weconduct the first pilot study in building a GAN \textbf{completely free ofconvolutions}, using only pure transformer-based architectures. Our vanilla GANarchitecture, dubbed \textbf{TransGAN}, consists of a memory-friendlytransformer-based generator that progressively increases feature resolutionwhile decreasing embedding dimension, and a patch-level discriminator that isalso transformer-based. We then demonstrate TransGAN to notably benefit fromdata augmentations (more than standard GANs), a multi-task co-training strategyfor the generator, and a locally initialized self-attention that emphasizes theneighborhood smoothness of natural images. Equipped with those findings,TransGAN can effectively scale up with bigger models and high-resolution imagedatasets. Specifically, our best architecture achieves highly competitiveperformance compared to current state-of-the-art GANs based on convolutionalbackbones. Specifically, TransGAN sets \textbf{new state-of-the-art} IS scoreof 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 ISscore and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA$64\times64$, respectively. We also conclude with a discussion of the currentlimitations and future potential of TransGAN. The code is available at\url{https://github.com/VITA-Group/TransGAN}.

 

Quick Read (beta)

loading the full paper ...