Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

  • 2018-11-29 18:06:22
  • Howard Chen, Alane Shur, Dipendra Misra, Noah Snavely, Yoav Artzi
  • 23

Abstract

We study the problem of jointly reasoning about language and vision through anavigation and spatial reasoning task. We introduce the Touchdown task anddataset, where an agent must first follow navigation instructions in areal-life visual urban environment to a goal position, and then identify in theobserved image a location described in natural language to find a hiddenobject. The data contains 9,326 examples of English instructions and spatialdescriptions paired with demonstrations. We perform qualitative linguisticanalysis, and show that the data displays richer use of spatial reasoningcompared to related resources. Empirical analysis shows the data presents anopen challenge to existing methods.

 

Introduction (beta)

None

 

Conclusion (beta)

None