Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning

Abstract

Despite the increasing interest in multi-agent reinforcement learning (MARL)in the community, understanding its theoretical foundation has long beenrecognized as a challenging problem. In this work, we make an attempt towardsaddressing this problem, by providing finite-sample analyses for fullydecentralized MARL. Specifically, we consider two fully decentralized MARLsettings, where teams of agents are connected by time-varying communicationnetworks, and either collaborate or compete in a zero-sum game, without theabsence of any central controller. These settings cover many conventional MARLsettings in the literature. For both settings, we develop batch MARL algorithmsthat can be implemented in a fully decentralized fashion, and quantify thefinite-sample errors of the estimated action-value functions. Our erroranalyses characterize how the function class, the number of samples within eachiteration, and the number of iterations determine the statistical accuracy ofthe proposed algorithms. Our results, compared to the finite-sample bounds forsingle-agent RL, identify the involvement of additional error terms caused bydecentralized computation, which is inherent in our decentralized MARL setting.To our knowledge, our work appears to be the first finite-sample analyses forMARL, which sheds light on understanding both the sample and computationalefficiency of MARL algorithms.

Quick Read (beta)

loading the full paper ...