Safe Reinforcement Learning with Natural Language Constraints

Abstract

In this paper, we tackle the problem of learning control policies for taskswhen provided with constraints in natural language. In contrast to instructionfollowing, language here is used not to specify goals, but rather to describesituations that an agent must avoid during its exploration of the environment.Specifying constraints in natural language also differs from the predominantparadigm in safe reinforcement learning, where safety criteria are enforced byhand-defined cost functions. While natural language allows for easy andflexible specification of safety constraints and budget limitations, itsambiguous nature presents a challenge when mapping these specifications intorepresentations that can be used by techniques for safe reinforcement learning.To address this, we develop a model that contains two components: (1) aconstraint interpreter to encode natural language constraints into vectorrepresentations capturing spatial and temporal information on forbidden states,and (2) a policy network that uses these representations to output a policywith minimal constraint violations. Our model is end-to-end differentiable andwe train it using a recently proposed algorithm for constrained policyoptimization. To empirically demonstrate the effectiveness of our approach, wecreate a new benchmark task for autonomous navigation with crowd-sourcedfree-form text specifying three different types of constraints. Our methodoutperforms several baselines by achieving 6-7 times higher returns and 76%fewer constraint violations on average. Dataset and code to reproduce ourexperiments are available at https://sites.google.com/view/polco-hazard-world/.

Quick Read (beta)

loading the full paper ...