The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication

Abstract

The paper considers independent reinforcement learning (IRL) for multi-agentcollaborative decision-making in the paradigm of federated learning (FL).However, FL generates excessive communication overheads between agents and aremote central server, especially when it involves a large number of agents oriterations. Besides, due to the heterogeneity of independent learningenvironments, multiple agents may undergo asynchronous Markov decisionprocesses (MDPs), which will affect the training samples and the model'sconvergence performance. On top of the variation-aware periodic averaging (VPA)method and the policy-based deep reinforcement learning (DRL) algorithm (i.e.,proximal policy optimization (PPO)), this paper proposes two advancedoptimization schemes orienting to stochastic gradient descent (SGD): 1) Adecay-based scheme gradually decays the weights of a model's local gradientswith the progress of successive local updates, and 2) By representing theagents as a graph, a consensus-based scheme studies the impact of exchanging amodel's local gradients among nearby agents from an algebraic connectivityperspective. This paper also provides novel convergence guarantees for bothdeveloped schemes, and demonstrates their superior effectiveness and efficiencyin improving the system's utility value through theoretical analyses andsimulation results.

Quick Read (beta)

loading the full paper ...