First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

Abstract

In reinforcement learning, an agent attempts to learn high-performingbehaviors through interacting with the environment, such behaviors are oftenquantified in the form of a reward function. However some aspects of behavior,such as ones which are deemed unsafe and are to be avoided, are best capturedthrough constraints. We propose a novel approach called First Order ConstrainedOptimization in Policy Space (FOCOPS) which maximizes an agent's overall rewardwhile ensuring the agent satisfies a set of cost constraints. Using datagenerated from the current policy, FOCOPS first finds the optimal update policyby solving a constrained optimization problem in the nonparameterized policyspace. FOCOPS then projects the update policy back into the parametric policyspace. Our approach provides a guarantee for constraint satisfaction throughouttraining and is first-order in nature therefore extremely simple to implement.We provide empirical evidence that our algorithm achieves better performance ona set of constrained robotics locomotive tasks compared to current state of theart approaches.

Quick Read (beta)

loading the full paper ...