Compositional GAN: Learning Conditional Image Composition

Abstract

Generative Adversarial Networks (GANs) can produce images of surprisingcomplexity and realism, but are generally modeled to sample from a singlelatent source ignoring the explicit spatial interaction between multipleentities that could be present in a scene. Capturing such complex interactionsbetween different objects in the world, including their relative scaling,spatial layout, occlusion, or viewpoint transformation is a challengingproblem. In this work, we propose to model object composition in a GANframework as a self-consistent composition-decomposition network. Our model isconditioned on the object images from their marginal distributions to generatea realistic image from their joint distribution by explicitly learning thepossible interactions. We evaluate our model through qualitative experimentsand user evaluations in both the scenarios when either paired or unpairedexamples for the individual object images and the joint scenes are given duringtraining. Our results reveal that the learned model captures potentialinteractions between the two object domains given as input to output newinstances of composed scene at test time in a reasonable fashion.

Quick Read (beta)

loading the full paper ...