Communication-Efficient Distributed Reinforcement Learning

Abstract

This paper studies the distributed reinforcement learning (DRL) probleminvolving a central controller and a group of learners. Two DRL settings thatfind broad applications are considered: multi-agent reinforcement learning (RL)and parallel RL. In both settings, frequent information exchange between thelearners and the controller are required. However, for many distributedsystems, e.g., parallel machines for training deep RL algorithms, andmulti-robot systems for learning the optimal coordination strategies, theoverhead caused by frequent communication is not negligible and becomes thebottleneck of the overall performance. To overcome this challenge, we develop anew policy gradient method that is amenable to efficient implementation in suchcommunication-constrained settings. By adaptively skipping the policy gradientcommunication, our method can reduce the communication overhead withoutdegrading the learning accuracy. Analytically, we can establish that i) theconvergence rate of our algorithm is the same as the vanilla policy gradientfor the DRL tasks; and, ii) if the distributed computing units areheterogeneous in terms of their reward functions and initial statedistributions, the number of communication rounds needed to achieve a targetedlearning accuracy is reduced. Numerical experiments on a popular multi-agent RLbenchmark corroborate the significant communication reduction of our algorithmcompared to the alternatives.

Quick Read (beta)

loading the full paper ...