Abstract
Empowering safe exploration of reinforcement learning (RL) agents duringtraining is a critical challenge towards their deployment in many real-worldscenarios. When prior knowledge of the domain or task is unavailable, trainingRL agents in unknown, \textit{black-box} environments presents an even greatersafety risk. We introduce \mbox{ADVICE} (Adaptive Shielding with a ContrastiveAutoencoder), a novel post-shielding technique that distinguishes safe andunsafe features of state-action pairs during training, and uses this knowledgeto protect the RL agent from executing actions that yield likely hazardousoutcomes. Our comprehensive experimental evaluation against state-of-the-artsafe RL exploration techniques shows that ADVICE significantly reduces safetyviolations ($\approx\!\!50\%$) during training, with a competitive outcomereward compared to other techniques.