Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

Abstract

In multi-agent reinforcement learning, the behaviors that agents learn in asingle Markov Game (MG) are typically confined to the given agent number (i.e.,population size). Every single MG induced by varying population sizes maypossess distinct optimal joint strategies and game-specific knowledge, whichare modeled independently in modern multi-agent algorithms. In this work, wefocus on creating agents that generalize across population-varying MGs. Insteadof learning a unimodal policy, each agent learns a policy set that is formed byeffective strategies across a variety of games. We propose Meta Representationsfor Agents (MRA) that explicitly models the game-common and game-specificstrategic knowledge. By representing the policy sets with multi-modal latentpolicies, the common strategic knowledge and diverse strategic modes arediscovered with an iterative optimization procedure. We prove that as anapproximation to a constrained mutual information maximization objective, thelearned policies can reach Nash Equilibrium in every evaluation MG under theassumption of Lipschitz game on a sufficiently large latent space. Whendeploying it at practical latent models with limited size, fast adaptation canbe achieved by leveraging the first-order gradient information. Extensiveexperiments show the effectiveness of MRA on both training performance andgeneralization ability in hard and unseen games.

Quick Read (beta)

loading the full paper ...