Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Abstract

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge ofefficient exploration due to the exponential increase in the size of the jointstate-action space. While demonstration-guided learning has proven beneficialin single-agent settings, its direct applicability to MARL is hindered by thepractical difficulty of obtaining joint expert demonstrations. In this work, weintroduce a novel concept of personalized expert demonstrations, tailored foreach individual agent or, more broadly, each individual type of agent within aheterogeneous team. These demonstrations solely pertain to single-agentbehaviors and how each agent can achieve personal goals without encompassingany cooperative elements, thus naively imitating them will not achievecooperation due to potential conflicts. To this end, we propose an approachthat selectively utilizes personalized expert demonstrations as guidance andallows agents to learn to cooperate, namely personalized expert-guided MARL(PegMARL). This algorithm utilizes two discriminators: the first providesincentives based on the alignment of individual agent behavior withdemonstrations, and the second regulates incentives based on whether thebehaviors lead to the desired outcome. We evaluate PegMARL using personalizeddemonstrations in both discrete and continuous environments. The experimentalresults demonstrate that PegMARL outperforms state-of-the-art MARL algorithmsin solving coordinated tasks, achieving strong performance even when providedwith suboptimal personalized demonstrations. We also showcase PegMARL'scapability of leveraging joint demonstrations in the StarCraft scenario andconverging effectively even with demonstrations from non-co-trained policies.

Quick Read (beta)

loading the full paper ...