Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

Abstract

While reinforcement learning produces very promising results for manyapplications, its main disadvantage is the lack of safety guarantees, whichprevents its use in safety-critical systems. In this work, we address thisissue by a safety shield for nonlinear continuous systems that solvereach-avoid tasks. Our safety shield prevents applying potentially unsafeactions from a reinforcement learning agent by projecting the proposed actionto the closest safe action. This approach is called action projection and isimplemented via mixed-integer optimization. The safety constraints for actionprojection are obtained by applying parameterized reachability analysis usingpolynomial zonotopes, which enables to accurately capture the nonlinear effectsof the actions on the system. In contrast to other state-of-the-art approachesfor action projection, our safety shield can efficiently handle inputconstraints and dynamic obstacles, eases incorporation of the spatial robotdimensions into the safety constraints, guarantees robust safety despiteprocess noise and measurement errors, and is well suited for high-dimensionalsystems, as we demonstrate on several challenging benchmark systems.

Quick Read (beta)

loading the full paper ...