GILT: Generating Images from Long Text

Abstract

Creating an image reflecting the content of a long text is a complex processthat requires a sense of creativity. For example, creating a book cover or amovie poster based on their summary or a food image based on its recipe. Inthis paper we present the new task of generating images from long text thatdoes not describe the visual content of the image directly. For this, we builda system for generating high-resolution 256 $\times$ 256 images of foodconditioned on their recipes. The relation between the recipe text (without itstitle) to the visual content of the image is vague, and the textual structureof recipes is complex, consisting of two sections (ingredients andinstructions) both containing multiple sentences. We used the recipe1M dataset to train and evaluate our model that is based ona the StackGAN-v2 architecture.

Quick Read (beta)

loading the full paper ...