Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities

Abstract

Humans can ground natural language commands to tasks at both abstract andfine-grained levels of specificity. For instance, a human forklift operator canbe instructed to perform a high-level action, like "grab a pallet" or alow-level action like "tilt back a little bit." While robots are also capableof grounding language commands to tasks, previous methods implicitly assumethat all commands and tasks reside at a single, fixed level of abstraction.Additionally, methods that do not use multiple levels of abstraction encounterinefficient planning and execution times as they solve tasks at a single levelof abstraction with large, intractable state-action spaces closely resemblingreal world complexity. In this work, by grounding commands to all the tasks orsubtasks available in a hierarchical planning framework, we arrive at a modelcapable of interpreting language at multiple levels of specificity ranging fromcoarse to more granular. We show that the accuracy of the grounding procedureis improved when simultaneously inferring the degree of abstraction in languageused to communicate the task. Leveraging hierarchy also improves efficiency:our proposed approach enables a robot to respond to a command within one secondon 90% of our tasks, while baselines take over twenty seconds on half thetasks. Finally, we demonstrate that a real, physical robot can ground commandsat multiple levels of abstraction allowing it to efficiently plan differentsubtasks within the same planning hierarchy.

Quick Read (beta)

loading the full paper ...