State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Abstract

Constrained reinforcement learning involves multiple rewards that mustindividually accumulate to given thresholds. In this class of problems, we showa simple example in which the desired optimal policy cannot be induced by anylinear combination of rewards. Hence, there exist constrained reinforcementlearning problems for which neither regularized nor classical primal-dualmethods yield optimal policies. This work addresses this shortcoming byaugmenting the state with Lagrange multipliers and reinterpreting primal-dualmethods as the portion of the dynamics that drives the multipliers evolution.This approach provides a systematic state augmentation procedure that isguaranteed to solve reinforcement learning problems with constraints. Thus,while primal-dual methods can fail at finding optimal policies, running thedual dynamics while executing the augmented policy yields an algorithm thatprovably samples actions from the optimal policy.

Quick Read (beta)

loading the full paper ...