Image-to-Image Translation with Text Guidance

  • 2020-02-12 21:09:15
  • Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
  • 43

Abstract

The goal of this paper is to embed controllable factors, i.e., naturallanguage descriptions, into image-to-image translation with generativeadversarial networks, which allows text descriptions to determine the visualattributes of synthetic images. We propose four key components: (1) theimplementation of part-of-speech tagging to filter out non-semantic words inthe given description, (2) the adoption of an affine combination module toeffectively fuse different modality text and image features, (3) a novelrefined multi-stage architecture to strengthen the differential ability ofdiscriminators and the rectification ability of generators, and (4) a newstructure loss to further improve discriminators to better distinguish real andsynthetic images. Extensive experiments on the COCO dataset demonstrate thatour method has a superior performance on both visual realism and semanticconsistency with given descriptions.

 

Quick Read (beta)

loading the full paper ...