Lyapunov-based uncertainty-aware safe reinforcement learning

Abstract

Reinforcement learning (RL) has shown a promising performance in learningoptimal policies for a variety of sequential decision-making tasks. However, inmany real-world RL problems, besides optimizing the main objectives, the agentis expected to satisfy a certain level of safety (e.g., avoiding collisions inautonomous driving). While RL problems are commonly formalized as Markovdecision processes (MDPs), safety constraints are incorporated via constrainedMarkov decision processes (CMDPs). Although recent advances in safe RL haveenabled learning safe policies in CMDPs, these safety requirements should besatisfied during both training and in the deployment process. Furthermore, itis shown that in memory-based and partially observable environments, thesemethods fail to maintain safety over unseen out-of-distribution observations.To address these limitations, we propose a Lyapunov-based uncertainty-awaresafe RL model. The introduced model adopts a Lyapunov function that convertstrajectory-based constraints to a set of local linear constraints. Furthermore,to ensure the safety of the agent in highly uncertain environments, anuncertainty quantification method is developed that enables identifyingrisk-averse actions through estimating the probability of constraintviolations. Moreover, a Transformers model is integrated to provide the agentwith memory to process long time horizons of information via the self-attentionmechanism. The proposed model is evaluated in grid-world navigation tasks wheresafety is defined as avoiding static and dynamic obstacles in fully andpartially observable environments. The results of these experiments show asignificant improvement in the performance of the agent both in achievingoptimality and satisfying safety constraints.

Quick Read (beta)

loading the full paper ...