Robust Multi-Agent Control via Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Abstract

In multi-agent reinforcement learning, optimal control with robustnessguarantees are critical for its deployment in real world. However, existingmethods face challenges related to sample complexity, training instability,potential suboptimal Nash Equilibrium convergence and non-robustness tomultiple perturbations. In this paper, we propose a unified framework forlearning \emph{stochastic} policies to resolve these issues. We embedcooperative MARL problems into probabilistic graphical models, from which wederive the maximum entropy (MaxEnt) objective optimal for MARL. Based on theMaxEnt framework, we propose \emph{Heterogeneous-Agent Soft Actor-Critic}(HASAC) algorithm. Theoretically, we prove the monotonic improvement andconvergence to \emph{quantal response equilibrium} (QRE) properties of HASAC.Furthermore, HASAC is provably robust against a wide range of real-worlduncertainties, including perturbations in rewards, environment dynamics,states, and actions. Finally, we generalize a unified template for MaxEntalgorithmic design named \emph{Maximum Entropy Heterogeneous-Agent MirrorLearning} (MEHAML), which provides any induced method with the same guaranteesas HASAC. We evaluate HASAC on seven benchmarks: Bi-DexHands, Multi-AgentMuJoCo, Pursuit-Evade, StarCraft Multi-Agent Challenge, Google ResearchFootball, Multi-Agent Particle Environment, Light Aircraft Game. Results showthat HASAC consistently outperforms strong baselines in 34 out of 38 tasks,exhibiting improved training stability, better sample efficiency and sufficientexploration. The robustness of HASAC was further validated when encounteringuncertainties in rewards, dynamics, states, and actions of 14 magnitudes, andreal-world deployment in a multi-robot arena against these four types ofuncertainties. See our page at \url{https://sites.google.com/view/meharl}.

Quick Read (beta)

loading the full paper ...