Image Generation from Scene Graphs

Abstract

To truly understand the visual world our models should be able not only torecognize images but also generate them. To this end, there has been excitingrecent progress on generating images from natural language descriptions. Thesemethods give stunning results on limited domains such as descriptions of birdsor flowers, but struggle to faithfully reproduce complex sentences with manyobjects and relationships. To overcome this limitation we propose a method forgenerating images from scene graphs, enabling explicitly reasoning aboutobjects and their relationships. Our model uses graph convolution to processinput graphs, computes a scene layout by predicting bounding boxes andsegmentation masks for objects, and converts the layout to an image with acascaded refinement network. The network is trained adversarially against apair of discriminators to ensure realistic outputs. We validate our approach onVisual Genome and COCO-Stuff, where qualitative results, ablations, and userstudies demonstrate our method's ability to generate complex images withmultiple objects.

Quick Read (beta)

loading the full paper ...