Interpretable Policy Specification and Synthesis through Natural Language and RL

Abstract

Policy specification is a process by which a human can initialize a robot'sbehaviour and, in turn, warm-start policy optimization via ReinforcementLearning (RL). While policy specification/design is inherently a collaborativeprocess, modern methods based on Learning from Demonstration or Deep RL lackthe model interpretability and accessibility to be classified as such. Currentstate-of-the-art methods for policy specification rely on black-box models,which are an insufficient means of collaboration for non-expert users: Thesemodels provide no means of inspecting policies learnt by the agent and are notfocused on creating a usable modality for teaching robot behaviour. In thispaper, we propose a novel machine learning framework that enables humans to 1)specify, through natural language, interpretable policies in the form ofeasy-to-understand decision trees, 2) leverage these policies to warm-startreinforcement learning and 3) outperform baselines that lack our naturallanguage initialization mechanism. We train our approach by collecting afirst-of-its-kind corpus mapping free-form natural language policy descriptionsto decision tree-based policies. We show that our novel framework translatesnatural language to decision trees with a 96% and 97% accuracy on a held-outcorpus across two domains, respectively. Finally, we validate that policiesinitialized with natural language commands are able to significantly outperformrelevant baselines (p < 0.001) that do not benefit from our naturallanguage-based warm-start technique.

Quick Read (beta)

loading the full paper ...