Encoding Spatial Relations from Natural Language

  • 2018-07-04 16:38:49
  • Tiago Ramalho, Tomáš Kociský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann
  • 21


Natural language processing has made significant inroads into learning thesemantics of words through distributional approaches, however representationslearnt via these methods fail to capture certain kinds of information implicitin the real world. In particular, spatial relations are encoded in a way thatis inconsistent with human spatial reasoning and lacking invariance toviewpoint changes. We present a system capable of capturing the semantics ofspatial relations such as behind, left of, etc from natural language. Our keycontributions are a novel multi-modal objective based on generating images ofscenes from their textual descriptions, and a new dataset on which to train it.We demonstrate that internal representations are robust to meaning preservingtransformations of descriptions (paraphrase invariance), while viewpointinvariance is an emergent property of the system.


Introduction (beta)



Conclusion (beta)