Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Abstract

To improve the efficiency of reinforcement learning (RL), we propose a novelasynchronous federated reinforcement learning (FedRL) framework termed AFedPG,which constructs a global model through collaboration among $N$ agents usingpolicy gradient (PG) updates. To address the challenge of lagged policies inasynchronous settings, we design a delay-adaptive lookahead technique\textit{specifically for FedRL} that can effectively handle heterogeneousarrival times of policy gradients. We analyze the theoretical globalconvergence bound of AFedPG, and characterize the advantage of the proposedalgorithm in terms of both the sample complexity and time complexity.Specifically, our AFedPG method achieves $O(\frac{{\epsilon}^{-2.5}}{N})$sample complexity for global convergence at each agent on average. Compared tothe single agent setting with $O(\epsilon^{-2.5})$ sample complexity, it enjoysa linear speedup with respect to the number of agents. Moreover, compared tosynchronous FedPG, AFedPG improves the time complexity from$O(\frac{t_{\max}}{N})$ to $O({\sum_{i=1}^{N} \frac{1}{t_{i}}})^{-1}$, where$t_{i}$ denotes the time consumption in each iteration at agent $i$, and$t_{\max}$ is the largest one. The latter complexity $O({\sum_{i=1}^{N}\frac{1}{t_{i}}})^{-1}$ is always smaller than the former one, and thisimprovement becomes significant in large-scale federated settings withheterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, weempirically verify the improved performance of AFedPG in four widely usedMuJoCo environments with varying numbers of agents. We also demonstrate theadvantages of AFedPG in various computing heterogeneity scenarios.

Quick Read (beta)

loading the full paper ...