Encoding Spatial Relations from Natural Language

Abstract

Natural language processing has made significant inroads into learning thesemantics of words through distributional approaches, however representationslearnt via these methods fail to capture certain kinds of information implicitin the real world. In particular, spatial relations are encoded in a way thatis inconsistent with human spatial reasoning and lacking invariance toviewpoint changes. We present a system capable of capturing the semantics ofspatial relations such as behind, left of, etc from natural language. Our keycontributions are a novel multi-modal objective based on generating images ofscenes from their textual descriptions, and a new dataset on which to train it.We demonstrate that internal representations are robust to meaning preservingtransformations of descriptions (paraphrase invariance), while viewpointinvariance is an emergent property of the system.

Quick Read (beta)

loading the full paper ...