Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View

Abstract

The Touchdown dataset (Chen et al., 2019) provides instructions by humanannotators for navigation through New York City streets and for resolvingspatial descriptions at a given location. To enable the wider researchcommunity to work effectively with the Touchdown tasks, we are publiclyreleasing the 29k raw Street View panoramas needed for Touchdown. We follow theprocess used for the StreetLearn data release (Mirowski et al., 2019) to checkpanoramas for personally identifiable information and blur them as necessary.These have been added to the StreetLearn dataset and can be obtained via thesame process as used previously for StreetLearn. We also provide a referenceimplementation for both of the Touchdown tasks: vision and language navigation(VLN) and spatial description resolution (SDR). We compare our model results tothose given in Chen et al. (2019) and show that the panoramas we have added toStreetLearn fully support both Touchdown tasks and can be used effectively forfurther research and comparison.

Quick Read (beta)

loading the full paper ...