Energy-based Surprise Minimization for Multi-Agent Value Factorization

Abstract

Multi-Agent Reinforcement Learning (MARL) has demonstrated significantsuccess in training decentralised policies in a centralised manner by makinguse of value factorization methods. However, addressing surprise acrossspurious states and approximation bias remain open problems for multi-agentsettings. We introduce the Energy-based MIXer (EMIX), an algorithm whichminimizes surprise utilizing the energy across agents. Our contributions arethreefold; (1) EMIX introduces a novel surprise minimization technique acrossmultiple agents in the case of multi-agent partially-observable settings. (2)EMIX highlights the first practical use of energy functions in MARL (to ourknowledge) with theoretical guarantees and experiment validations of the energyoperator. Lastly, (3) EMIX presents a novel technique for addressingoverestimation bias across agents in MARL. When evaluated on a range ofchallenging StarCraft II micromanagement scenarios, EMIX demonstratesconsistent state-of-the-art performance for multi-agent surprise minimization.Moreover, our ablation study highlights the necessity of the energy-basedscheme and the need for elimination of overestimation bias in MARL. Ourimplementation of EMIX and videos of agents are available athttps://karush17.github.io/emix-web/.

Quick Read (beta)

loading the full paper ...