LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Abstract

Cooperative multi-agent reinforcement learning (MARL) has made prominentprogress in recent years. For training efficiency and scalability, most of theMARL algorithms make all agents share the same policy or value network.However, many complex multi-agent tasks require agents with a variety ofspecific abilities to handle different subtasks. Sharing parametersindiscriminately may lead to similar behaviors across all agents, which willlimit the exploration efficiency and be detrimental to the final performance.To balance the training complexity and the diversity of agents' behaviors, wepropose a novel framework for learning dynamic subtask assignment (LDSA) incooperative MARL. Specifically, we first introduce a subtask encoder thatconstructs a vector representation for each subtask according to its identity.To reasonably assign agents to different subtasks, we propose an ability-basedsubtask selection strategy, which can dynamically group agents with similarabilities into the same subtask. Then, we condition the subtask policy on itsrepresentation and agents dealing with the same subtask share their experiencesto train the subtask policy. We further introduce two regularizers to increasethe representation difference between subtasks and avoid agents changingsubtasks frequently to stabilize training, respectively. Empirical results showthat LDSA learns reasonable and effective subtask assignment for bettercollaboration and significantly improves the learning performance on thechallenging StarCraft II micromanagement benchmark.

Quick Read (beta)

loading the full paper ...