Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization

Abstract

Despite the success of single-agent reinforcement learning, multi-agentreinforcement learning (MARL) remains challenging due to complex interactionsbetween agents. Motivated by decentralized applications such as sensornetworks, swarm robotics, and power grids, we study policy evaluation in MARL,where agents with jointly observed state-action pairs and private local rewardscollaborate to learn the value of a given policy. In this paper, we propose adouble averaging scheme, where each agent iteratively performs averaging overboth space and time to incorporate neighboring gradient information and localreward information, respectively. We prove that the proposed algorithmconverges to the optimal solution at a global geometric rate. In particular,such an algorithm is built upon a primal-dual reformulation of the mean squaredprojected Bellman error minimization problem, which gives rise to adecentralized convex-concave saddle-point problem. To the best of ourknowledge, the proposed double averaging primal-dual optimization algorithm isthe first to achieve fast finite-time convergence on decentralizedconvex-concave saddle-point problems.

Quick Read (beta)

loading the full paper ...