GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Abstract

Safe Reinforcement Learning (SRL) aims to realize a safe learning process forDeep Reinforcement Learning (DRL) algorithms by incorporating safetyconstraints. However, the efficacy of SRL approaches often relies on accuratefunction approximations, which are notably challenging to achieve in the earlylearning stages due to data insufficiency. To address this issue, we introducein this work a novel Generalizable Safety enhancer (GenSafe) that is able toovercome the challenge of data insufficiency and enhance the performance of SRLapproaches. Leveraging model order reduction techniques, we first propose aninnovative method to construct a Reduced Order Markov Decision Process (ROMDP)as a low-dimensional approximator of the original safety constraints. Then, bysolving the reformulated ROMDP-based constraints, GenSafe refines the actionsof the agent to increase the possibility of constraint satisfaction.Essentially, GenSafe acts as an additional safety layer for SRL algorithms. Weevaluate GenSafe on multiple SRL approaches and benchmark problems. The resultsdemonstrate its capability to improve safety performance, especially in theearly learning phases, while maintaining satisfactory task performance. Ourproposed GenSafe not only offers a novel measure to augment existing SRLmethods but also shows broad compatibility with various SRL algorithms, makingit applicable to a wide range of systems and SRL problems.

Quick Read (beta)

loading the full paper ...