Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

Abstract

This paper considers multi-agent reinforcement learning (MARL) tasks whereagents receive a shared global reward at the end of an episode. The delayednature of this reward affects the ability of the agents to assess the qualityof their actions at intermediate time-steps. This paper focuses on developingmethods to learn a temporal redistribution of the episodic reward to obtain adense reward signal. Solving such MARL problems requires addressing twochallenges: identifying (1) relative importance of states along the length ofan episode (along time), and (2) relative importance of individual agents'states at any single time-step (among agents). In this paper, we introduceAgent-Temporal Attention for Reward Redistribution in Episodic Multi-AgentReinforcement Learning (AREL) to address these two challenges. AREL usesattention mechanisms to characterize the influence of actions on statetransitions along trajectories (temporal attention), and how each agent isaffected by other agents at each time-step (agent attention). The redistributedrewards predicted by AREL are dense, and can be integrated with any given MARLalgorithm. We evaluate AREL on challenging tasks from the Particle Worldenvironment and the StarCraft Multi-Agent Challenge. AREL results in higherrewards in Particle World, and improved win rates in StarCraft compared tothree state-of-the-art reward redistribution methods. Our code is available athttps://github.com/baicenxiao/AREL.

Quick Read (beta)

loading the full paper ...