Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

Abstract

Today's robot policies exhibit subpar performance when faced with thechallenge of generalizing to novel environments. Human corrective feedback is acrucial form of guidance to enable such generalization. However, adapting toand learning from online human corrections is a non-trivial endeavor: not onlydo robots need to remember human feedback over time to retrieve the rightinformation in new settings and reduce the intervention rate, but also theywould need to be able to respond to feedback that can be arbitrary correctionsabout high-level human preferences to low-level adjustments to skillparameters. In this work, we present Distillation and Retrieval of OnlineCorrections (DROC), a large language model (LLM)-based system that can respondto arbitrary forms of language feedback, distill generalizable knowledge fromcorrections, and retrieve relevant past experiences based on textual and visualsimilarity for improving performance in novel settings. DROC is able to respondto a sequence of online language corrections that address failures in bothhigh-level task plans and low-level skill primitives. We demonstrate that DROCeffectively distills the relevant information from the sequence of onlinecorrections in a knowledge base and retrieves that knowledge in settings withnew task or object instances. DROC outperforms other techniques that directlygenerate robot code via LLMs by using only half of the total number ofcorrections needed in the first round and requires little to no correctionsafter two iterations. We show further results, videos, prompts and code onhttps://sites.google.com/stanford.edu/droc .

Quick Read (beta)

loading the full paper ...