Abstract
A common formulation of constrained reinforcement learning involves multiplerewards that must individually accumulate to given thresholds. In this class ofproblems, we show a simple example in which the desired optimal policy cannotbe induced by any weighted linear combination of rewards. Hence, there existconstrained reinforcement learning problems for which neither regularized norclassical primal-dual methods yield optimal policies. This work addresses thisshortcoming by augmenting the state with Lagrange multipliers andreinterpreting primal-dual methods as the portion of the dynamics that drivesthe multipliers evolution. This approach provides a systematic stateaugmentation procedure that is guaranteed to solve reinforcement learningproblems with constraints. Thus, as we illustrate by an example, while previousmethods can fail at finding optimal policies, running the dual dynamics whileexecuting the augmented policy yields an algorithm that provably samplesactions from the optimal policy.