Object Ordering with Bidirectional Matchings for Visual Reasoning

  • 2018-04-18 18:39:17
  • Hao Tan, Mohit Bansal
  • 22

Abstract

Visual reasoning with compositional natural language instructions, e.g.,based on the newly-released Cornell Natural Language Visual Reasoning (NLVR)dataset, is a challenging task, where the model needs to have the ability tocreate an accurate mapping between the diverse phrases and the several objectsplaced in complex arrangements in the image. Further, this mapping needs to beprocessed to answer the question in the statement given the ordering andrelationship of the objects across three similar images. In this paper, wepropose a novel end-to-end neural model for the NLVR task, where we first usejoint bidirectional attention to build a two-way conditioning between thevisual information and the language phrases. Next, we use an RL-based pointernetwork to sort and process the varying number of unordered objects (so as tomatch the order of the statement phrases) in each of the three images and thenpool over the three decisions. Our model achieves strong improvements (of 4-6%absolute) over the state-of-the-art on both the structured representation andraw image versions of the dataset.

 

Quick Read (beta)

loading the full paper ...