Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Abstract

Since reinforcement learning algorithms are notoriously data-intensive, thetask of sampling observations from the environment is usually split acrossmultiple agents. However, transferring these observations from the agents to acentral location can be prohibitively expensive in terms of the communicationcost, and it can also compromise the privacy of each agent's local behaviorpolicy. In this paper, we consider a federated reinforcement learning frameworkwhere multiple agents collaboratively learn a global model, without sharingtheir individual data and policies. Each agent maintains a local copy of themodel and updates it using locally sampled data. Although having N agentsenables the sampling of N times more data, it is not clear if it leads toproportional convergence speedup. We propose federated versions of on-policyTD, off-policy TD and Q-learning, and analyze their convergence. For all thesealgorithms, to the best of our knowledge, we are the first to considerMarkovian noise and multiple local updates, and prove a linear convergencespeedup with respect to the number of agents. To obtain these results, we showthat federated TD and Q-learning are special cases of a general framework forfederated stochastic approximation with Markovian noise, and we leverage thisframework to provide a unified convergence analysis that applies to all thealgorithms.

Quick Read (beta)

loading the full paper ...