A Lyapunov-based Approach to Safe Reinforcement Learning

Abstract

In many real-world reinforcement learning (RL) problems, besides optimizingthe main objective function, an agent must concurrently avoid violating anumber of constraints. In particular, besides optimizing performance it iscrucial to guarantee the safety of an agent during training as well asdeployment (e.g. a robot should avoid taking actions - exploratory or not -which irrevocably harm its hardware). To incorporate safety in RL, we derivealgorithms under the framework of constrained Markov decision problems (CMDPs),an extension of the standard Markov decision problems (MDPs) augmented withconstraints on expected cumulative costs. Our approach hinges on a novel\emph{Lyapunov} method. We define and present a method for constructingLyapunov functions, which provide an effective way to guarantee the globalsafety of a behavior policy during training via a set of local, linearconstraints. Leveraging these theoretical underpinnings, we show how to use theLyapunov approach to systematically transform dynamic programming (DP) and RLalgorithms into their safe counterparts. To illustrate their effectiveness, weevaluate these algorithms in several CMDP planning and decision-making tasks ona safety benchmark domain. Our results show that our proposed methodsignificantly outperforms existing baselines in balancing constraintsatisfaction and performance.

Quick Read (beta)

loading the full paper ...