Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

Abstract

Deep reinforcement learning has emerged as a powerful tool for obtaininghigh-performance policies. However, the safety of these policies has been along-standing issue. One promising paradigm to guarantee safety is a shield,which shields a policy from making unsafe actions. However, computing a shieldscales exponentially in the number of state variables. This is a particularconcern in multi-agent systems with many agents. In this work, we propose anovel approach for multi-agent shielding. We address scalability by computingindividual shields for each agent. The challenge is that typical safetyspecifications are global properties, but the shields of individual agents onlyensure local properties. Our key to overcome this challenge is to applyassume-guarantee reasoning. Specifically, we present a sound proof rule thatdecomposes a (global, complex) safety specification into (local, simple)obligations for the shields of the individual agents. Moreover, we show thatapplying the shields during reinforcement learning significantly improves thequality of the policies obtained for a given training budget. We demonstratethe effectiveness and scalability of our multi-agent shielding framework in twocase studies, reducing the computation time from hours to seconds and achievingfast learning convergence.

Quick Read (beta)

loading the full paper ...