Safety Filtering While Training: Improving the Performance and Sample Efficiency of Reinforcement Learning Agents

Abstract

Reinforcement learning (RL) controllers are flexible and performant butrarely guarantee safety. Safety filters impart hard safety guarantees to RLcontrollers while maintaining flexibility. However, safety filters can causeundesired behaviours due to the separation between the controller and thesafety filter, often degrading performance and robustness. In this paper, wepropose several modifications to incorporating the safety filter in training RLcontrollers rather than solely applying it during evaluation. The modificationsallow the RL controller to learn to account for the safety filter, improvingperformance. Additionally, our modifications significantly improve sampleefficiency and eliminate training-time constraint violations. We verified theproposed modifications in simulated and real experiments with a Crazyflie 2.0drone. In experiments, we show that the proposed training approaches requiresignificantly fewer environment interactions and improve performance by up to20% compared to standard RL training.

Quick Read (beta)

loading the full paper ...