Open-vocabulary Queryable Scene Representations for Real World Planning

Abstract

Large language models (LLMs) have unlocked new capabilities of task planningfrom human instructions. However, prior attempts to apply LLMs to real-worldrobotic tasks are limited by the lack of grounding in the surrounding scene. Inthis paper, we develop NLMap, an open-vocabulary and queryable scenerepresentation to address this problem. NLMap serves as a framework to gatherand integrate contextual information into LLM planners, allowing them to seeand query available objects in the scene before generating acontext-conditioned plan. NLMap first establishes a natural language queryablescene representation with Visual Language models (VLMs). An LLM based objectproposal module parses instructions and proposes involved objects to query thescene representation for object availability and location. An LLM planner thenplans with such information about the scene. NLMap allows robots to operatewithout a fixed list of objects nor executable options, enabling real robotoperation unachievable by previous methods. Project website:https://nlmap-saycan.github.io

Quick Read (beta)

loading the full paper ...