Toward Multimodal Image-to-Image Translation

  • 2017-11-30 18:59:01
  • Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman
  • 52


Many image-to-image translation problems are ambiguous, as a single inputimage may correspond to multiple possible outputs. In this work, we aim tomodel a \emph{distribution} of possible outputs in a conditional generativemodeling setting. The ambiguity of the mapping is distilled in alow-dimensional latent vector, which can be randomly sampled at test time. Agenerator learns to map the given input, combined with this latent code, to theoutput. We explicitly encourage the connection between output and the latentcode to be invertible. This helps prevent a many-to-one mapping from the latentcode to the output during training, also known as the problem of mode collapse,and produces more diverse results. We explore several variants of this approachby employing different training objectives, network architectures, and methodsof injecting the latent code. Our proposed method encourages bijectiveconsistency between the latent encoding and output modes. We present asystematic comparison of our method and other variants on both perceptualrealism and diversity.


Introduction (beta)



Conclusion (beta)