Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement

  • 2023-03-13 08:09:20
  • Zirui Zhao, Wee Sun Lee, David Hsu
  • 0


We present a new method, PARsing And visual GrOuNding (ParaGon), forgrounding natural language in object placement tasks. Natural languagegenerally describes objects and spatial relations with compositionality andambiguity, two major obstacles to effective language grounding. Forcompositionality, ParaGon parses a language instruction into an object-centricgraph representation to ground objects individually. For ambiguity, ParaGonuses a novel particle-based graph neural network to reason about objectplacements with uncertainty. Essentially, ParaGon integrates a parsingalgorithm into a probabilistic, data-driven learning framework. It is fullydifferentiable and trained end-to-end from data for robustness against complex,ambiguous language input.


