Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding

  • 2025-01-31 10:45:55
  • Daniel Bethell, Simos Gerasimou, Radu Calinescu, Calum Imrie
  • 0

Abstract

Empowering safe exploration of reinforcement learning (RL) agents duringtraining is a critical challenge towards their deployment in many real-worldscenarios. When prior knowledge of the domain or task is unavailable, trainingRL agents in unknown, \textit{black-box} environments presents an even greatersafety risk. We introduce \mbox{ADVICE} (Adaptive Shielding with a ContrastiveAutoencoder), a novel post-shielding technique that distinguishes safe andunsafe features of state-action pairs during training, and uses this knowledgeto protect the RL agent from executing actions that yield likely hazardousoutcomes. Our comprehensive experimental evaluation against state-of-the-artsafe RL exploration techniques shows that ADVICE significantly reduces safetyviolations ($\approx\!\!50\%$) during training, with a competitive outcomereward compared to other techniques.

 

Quick Read (beta)

loading the full paper ...