Generative Compositional Augmentations for Scene Graph Prediction

  • 2021-04-15 17:42:25
  • Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky
  • 0

Abstract

Inferring objects and their relationships from an image in the form of ascene graph is useful in many applications at the intersection of vision andlanguage. In this work, we consider a challenging problem of compositionalgeneralization that emerges in this task due to a long tail data distribution.Current scene graph generation models are trained on a tiny fraction of thedistribution corresponding to the most frequent compositions, e.g. <cup, on,table>. However, test images might contain zero- and few-shot compositions ofobjects and relationships, e.g. <cup, on, surfboard>. Despite each of theobject categories and the predicate (e.g. 'on') being frequent in the trainingdata, the models often fail to properly understand such unseen or rarecompositions. To improve generalization, it is natural to attempt increasingthe diversity of the training distribution. However, in the graph domain thisis non-trivial. To that end, we propose a method to synthesize rare yetplausible scene graphs by perturbing real ones. We then propose and empiricallystudy a model based on conditional generative adversarial networks (GANs) thatallows us to generate visual features of perturbed scene graphs and learn fromthem in a joint fashion. When evaluated on the Visual Genome dataset, ourapproach yields marginal, but consistent improvements in zero- and few-shotmetrics. We analyze the limitations of our approach indicating promisingdirections for future research.

 

Quick Read (beta)

loading the full paper ...