Reachability Constrained Reinforcement Learning

Abstract

Constrained Reinforcement Learning (CRL) has gained significant interestrecently, since the satisfaction of safety constraints is critical for realworld problems. However, existing CRL methods constraining discountedcumulative costs generally lack rigorous definition and guarantee of safety. Onthe other hand, in the safe control research, safety is defined as persistentlysatisfying certain state constraints. Such persistent safety is possible onlyon a subset of the state space, called feasible set, where an optimal largestfeasible set exists for a given environment. Recent studies incorporating safecontrol with CRL using energy-based methods such as control barrier function(CBF), safety index (SI) leverage prior conservative estimation of feasiblesets, which harms performance of the learned policy. To deal with this problem,this paper proposes a reachability CRL (RCRL) method by using reachabilityanalysis to characterize the largest feasible sets. We characterize thefeasible set by the established self-consistency condition, then a safety valuefunction can be learned and used as constraints in CRL. We also use themulti-time scale stochastic approximation theory to prove that the proposedalgorithm converges to a local optimum, where the largest feasible set can beguaranteed. Empirical results on different benchmarks such as safe-control-gymand Safety-Gym validate the learned feasible set, the performance in optimalcriteria, and constraint satisfaction of RCRL, compared to state-of-the-art CRLbaselines.

Quick Read (beta)

loading the full paper ...