Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation

  • 2019-01-18 16:23:37
  • Wei Sun, Tianfu Wu
  • 8

Abstract

Image synthesis and image-to-image translation are two important generativelearning tasks. Remarkable progress has been made by learning GenerativeAdversarial Networks (GANs)~\cite{goodfellow2014generative} andcycle-consistent GANs (CycleGANs)~\cite{zhu2017unpaired} respectively. Thispaper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP)which is a novel architectural unit and can be easily integrated into bothgenerators and discriminators in GANs and CycleGANs. The proposed SPAPintegrates Atrous spatial pyramid~\cite{chen2018deeplab}, a proposed cascadeattention mechanism and residual connections~\cite{he2016deep}. It leveragesthe advantages of the three components to facilitate effective end-to-endgenerative learning: (i) the capability of fusing multi-scale information byASPP; (ii) the capability of capturing relative importance between both spatiallocations (especially multi-scale context) or feature channels by attention;(iii) the capability of preserving information and enhancing optimizationfeasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP arestudied and intriguing attention maps are observed in both tasks. Inexperiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128dataset~\cite{karras2017progressive}, and tested in CycleGANs on theImage-to-Image translation datasets including the Cityscapedataset~\cite{cordts2016cityscapes}, Facade and Aerial Mapsdataset~\cite{zhu2017unpaired}, both obtaining better performance.

 

Quick Read (beta)

loading the full paper ...