Abstract
Most multi-agent reinforcement learning (MARL) methods are limited in thescale of problems they can handle. With increasing numbers of agents, thenumber of training iterations required to find the optimal behaviors increasesexponentially due to the exponentially growing joint state and action spaces.This paper tackles this limitation by introducing a scalable MARL method calledDistributed multi-Agent Reinforcement Learning with One-hop Neighbors (DARL1N).DARL1N is an off-policy actor-critic method that addresses the curse ofdimensionality by restricting information exchanges among the agents to one-hopneighbors when representing value and policy functions. Each agent optimizesits value and policy functions over a one-hop neighborhood, significantlyreducing the learning complexity, yet maintaining expressiveness by trainingwith varying neighbor numbers and states. This structure allows us to formulatea distributed learning framework to further speed up the training procedure.Distributed computing systems, however, contain straggler compute nodes, whichare slow or unresponsive due to communication bottlenecks, software or hardwareproblems. To mitigate the detrimental straggler effect, we introduce a novelcoded distributed learning architecture, which leverages coding theory toimprove the resilience of the learning system to stragglers. Comprehensiveexperiments show that DARL1N significantly reduces training time withoutsacrificing policy quality and is scalable as the number of agents increases.Moreover, the coded distributed learning architecture improves trainingefficiency in the presence of stragglers.