Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning

Abstract

In multi-agent reinforcement learning (MARL), parameter sharing is commonlyemployed to enhance sample efficiency. However, the popular approach of fullparameter sharing often leads to homogeneous policies among agents, potentiallylimiting the performance benefits that could be derived from policy diversity.To address this critical limitation, we introduce \emph{Kaleidoscope}, a noveladaptive partial parameter sharing scheme that fosters policy heterogeneitywhile still maintaining high sample efficiency. Specifically, Kaleidoscopemaintains one set of common parameters alongside multiple sets of distinct,learnable masks for different agents, dictating the sharing of parameters. Itpromotes diversity among policy networks by encouraging discrepancy among thesemasks, without sacrificing the efficiencies of parameter sharing. This designallows Kaleidoscope to dynamically balance high sample efficiency with a broadpolicy representational capacity, effectively bridging the gap between fullparameter sharing and non-parameter sharing across various environments. Wefurther extend Kaleidoscope to critic ensembles in the context of actor-criticalgorithms, which could help improve value estimations.Our empiricalevaluations across extensive environments, including multi-agent particleenvironment, multi-agent MuJoCo and StarCraft multi-agent challenge v2,demonstrate the superior performance of Kaleidoscope compared with existingparameter sharing approaches, showcasing its potential for performanceenhancement in MARL. The code is publicly available at\url{https://github.com/LXXXXR/Kaleidoscope}.

Quick Read (beta)

loading the full paper ...