Spatial Language Understanding for Object Search in Partially Observed City-scale Environments

Abstract

Humans use spatial language to naturally describe object locations and theirrelations. Interpreting spatial language not only adds a perceptual modalityfor robots, but also reduces the barrier of interfacing with humans. Previouswork primarily considers spatial language as goal specification for instructionfollowing tasks in fully observable domains, often paired with reference pathsfor reward-based learning. However, spatial language is inherently subjectiveand potentially ambiguous or misleading. Hence, in this paper, we considerspatial language as a form of stochastic observation. We propose SLOOP (SpatialLanguage Object-Oriented POMDP), a new framework for partially observabledecision making with a probabilistic observation model for spatial language. Weapply SLOOP to object search in city-scale environments. To interpretambiguous, context-dependent prepositions (e.g. front), we design a simpleconvolutional neural network that predicts the language provider's latent frameof reference (FoR) given the environment context. Search strategies arecomputed via an online POMDP planner based on Monte Carlo Tree Search.Evaluation based on crowdsourced language data, collected over areas of fivecities in OpenStreetMap, shows that our approach achieves faster search andhigher success rate compared to baselines, with a wider margin as the spatiallanguage becomes more complex. Finally, we demonstrate the proposed method inAirSim, a realistic simulator where a drone is tasked to find cars in aneighborhood environment.

Quick Read (beta)

loading the full paper ...