Abstract
Traditional multi-agent reinforcement learning algorithms are difficultlyapplied in a large-scale multi-agent environment. The introduction of meanfield theory has enhanced the scalability of multi-agent reinforcement learningin recent years. This paper considers partially observable multi-agentreinforcement learning (MARL), where each agent can only observe other agentswithin a fixed range. This partial observability affects the agent's ability toassess the quality of the actions of surrounding agents. This paper focuses ondeveloping a method to capture more effective information from localobservations in order to select more effective actions. Previous work in thisfield employs probability distributions or weighted mean field to update theaverage actions of neighborhood agents, but it does not fully consider thefeature information of surrounding neighbors and leads to a local optimum. Inthis paper, we propose a novel multi-agent reinforcement learning algorithm,Partially Observable Mean Field Multi-Agent Reinforcement Learning based onGraph--Attention (GAMFQ) to remedy this flaw. GAMFQ uses a graph attentionmodule and a mean field module to describe how an agent is influenced by theactions of other agents at each time step. This graph attention module consistsof a graph attention encoder and a differentiable attention mechanism, and thismechanism outputs a dynamic graph to represent the effectiveness ofneighborhood agents against central agents. The mean--field module approximatesthe effect of a neighborhood agent on a central agent as the average effect ofeffective neighborhood agents. We evaluate GAMFQ on three challenging tasks inthe MAgents framework. Experiments show that GAMFQ outperforms baselinesincluding the state-of-the-art partially observable mean-field reinforcementlearning algorithms.