Reinforcement Learning with Convex Constraints

Abstract

In standard reinforcement learning (RL), a learning agent seeks to optimizethe overall reward. However, many key aspects of a desired behavior are morenaturally expressed as constraints. For instance, the designer may want tolimit the use of unsafe actions, increase the diversity of trajectories toenable exploration, or approximate expert trajectories when rewards are sparse.In this paper, we propose an algorithmic scheme that can handle a wide class ofconstraints in RL tasks: specifically, any constraints that require expectedvalues of some vector measurements (such as the use of an action) to lie in aconvex set. This captures previously studied constraints (such as safety andproximity to an expert), but also enables new classes of constraints (such asdiversity). Our approach comes with rigorous theoretical guarantees and onlyrelies on the ability to approximately solve standard RL tasks. As a result, itcan be easily adapted to work with any model-free or model-based RL. In ourexperiments, we show that it matches previous algorithms that enforce safetyvia constraints, but can also enforce new properties that these algorithms donot incorporate, such as diversity.

Quick Read (beta)

loading the full paper ...