Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning

Abstract

This paper deals with distributed policy optimization in reinforcementlearning, which involves a central controller and a group of learners. Inparticular, two typical settings encountered in several applications areconsidered: multi-agent reinforcement learning (RL) and parallel RL, wherefrequent information exchanges between the learners and the controller arerequired. For many practical distributed systems, however, the overhead causedby these frequent communication exchanges is considerable, and becomes thebottleneck of the overall performance. To address this challenge, a novelpolicy gradient approach is developed for solving distributed RL. The novelapproach adaptively skips the policy gradient communication during iterations,and can reduce the communication overhead without degrading learningperformance. It is established analytically that: i) the novel algorithm hasconvergence rate identical to that of the plain-vanilla policy gradient; whileii) if the distributed learners are heterogeneous in terms of their rewardfunctions, the number of communication rounds needed to achieve a desirablelearning accuracy is markedly reduced. Numerical experiments corroborate thecommunication reduction attained by the novel algorithm compared toalternatives.

Quick Read (beta)

loading the full paper ...