Guiding Policies with Language via Meta-Learning

Abstract

Behavioral skills or policies for autonomous agents are conventionallylearned from reward functions, via reinforcement learning, or fromdemonstrations, via imitation learning. However, both modes of taskspecification have their disadvantages: reward functions require manualengineering, while demonstrations require a human expert to be able to actuallyperform the task in order to generate the demonstration. Instruction followingfrom natural language instructions provides an appealing alternative: in thesame way that we can specify goals to other humans simply by speaking orwriting, we would like to be able to specify tasks for our machines. However, asingle instruction may be insufficient to fully communicate our intent or, evenif it is, may be insufficient for an autonomous agent to actually understandhow to perform the desired task. In this work, we propose an interactiveformulation of the task specification problem, where iterative languagecorrections are provided to an autonomous agent, guiding it in acquiring thedesired skill. Our proposed language-guided policy learning algorithm canintegrate an instruction and a sequence of corrections to acquire new skillsvery quickly. In our experiments, we show that this method can enable a policyto follow instructions and corrections for simulated navigation andmanipulation tasks, substantially outperforming direct, non-interactiveinstruction following.

Quick Read (beta)

loading the full paper ...