Abstract
Deep reinforcement learning algorithms (DRL) are increasingly being used insafety-critical systems. Ensuring the safety of DRL agents is a criticalconcern in such contexts. However, relying solely on testing is not sufficientto ensure safety as it does not offer guarantees. Building safety monitors isone solution to alleviate this challenge. This paper proposes SMARLA, a machinelearning-based safety monitoring approach designed for DRL agents. Forpractical reasons, SMARLA is agnostic to the type of DRL agent's inputs.Further, it is designed to be black-box (as it does not require access to theinternals or training data of the agent) by leveraging state abstraction tofacilitate the learning of safety violation prediction models from the agent'sstates using a reduced state space. We quantitatively and qualitativelyvalidated SMARLA on three well-known RL case studies. Empirical results revealthat SMARLA achieves accurate violation prediction with a low false positiverate and can predict safety violations at an early stage, approximately halfwaythrough the execution of the agent, before violations occur.