Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Abstract

Safe reinforcement learning (RL) is crucial for real-world applications, andmulti-agent interactions introduce additional safety challenges. WhileProbabilistic Logic Shields (PLS) has been a powerful proposal to enforcesafety in single-agent RL, their generalizability to multi-agent settingsremains unexplored. In this paper, we address this gap by conducting extensiveanalyses of PLS within decentralized, multi-agent environments, and in doingso, propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a generalframework for steering MARL towards norm-compliant outcomes. Our keycontributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD)update for shielded, independent Q-learning, which incorporates probabilisticconstraints directly into the value update process; (2) a probabilistic logicpolicy gradient method for shielded PPO with formal safety guarantees for MARL;and (3) comprehensive evaluation across symmetric and asymmetrically shielded$n$-player game-theoretic benchmarks, demonstrating fewer constraint violationsand significantly better cooperation under normative constraints. These resultsposition SMARL as an effective mechanism for equilibrium selection, paving theway toward safer, socially aligned multi-agent systems.

Quick Read (beta)

loading the full paper ...