Image Tranformer

  • 2018-02-15 20:37:15
  • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Ɓukasz Kaiser, Noam Shazeer, Alexander Ku
  • 47

Abstract

Image generation has been successfully cast as an autoregressive sequencegeneration or transformation problem. Recent work has shown that self-attentionis an effective way of modeling textual sequences. In this work, we generalizea recently proposed model architecture based on self-attention, theTransformer, to a sequence modeling formulation of image generation with atractable likelihood. By restricting the self-attention mechanism to attend tolocal neighborhoods we significantly increase the size of images the model canprocess in practice, despite maintaining significantly larger receptive fieldsper layer than typical convolutional neural networks. We propose anotherextension of self-attention allowing it to efficiently take advantage of thetwo-dimensional nature of images. While conceptually simple, our generativemodels significantly outperform the current state of the art in imagegeneration on ImageNet, improving the best published negative log-likelihood onImageNet from 3.83 to 3.77. We also present results on image super-resolutionwith a large magnification ratio, applying an encoder-decoder configuration ofour architecture. In a human evaluation study, we show that oursuper-resolution models improve significantly over previously publishedsuper-resolution models. Images generated by the model fool human observersthree times more often than the previous state of the art.

 

Quick Read (beta)

loading the full paper ...