Abstract
Shielding is widely used to enforce safety in reinforcement learning (RL),ensuring that an agent's actions remain compliant with formal specifications.Classical shielding approaches, however, are often static, in the sense thatthey assume fixed logical specifications and hand-crafted abstractions. Whilethese static shields provide safety under nominal assumptions, they fail toadapt when environment assumptions are violated. In this paper, we develop thefirst adaptive shielding framework - to the best of our knowledge - based onGeneralized Reactivity of rank 1 (GR(1)) specifications, a tractable andexpressive fragment of Linear Temporal Logic (LTL) that captures both safetyand liveness properties. Our method detects environment assumption violationsat runtime and employs Inductive Logic Programming (ILP) to automaticallyrepair GR(1) specifications online, in a systematic and interpretable way. Thisensures that the shield evolves gracefully, ensuring liveness is achievable andweakening goals only when necessary. We consider two case studies: Minepump andAtari Seaquest; showing that (i) static symbolic controllers are often severelysuboptimal when optimizing for auxiliary rewards, and (ii) RL agents equippedwith our adaptive shield maintain near-optimal reward and perfect logicalcompliance compared with static shields.