A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Abstract

Centralized Training for Decentralized Execution, where training is done in acentralized offline fashion, has become a popular solution paradigm inMulti-Agent Reinforcement Learning. Many such methods take the form ofactor-critic with state-based critics, since centralized training allows accessto the true system state, which can be useful during training despite not beingavailable at execution time. State-based critics have become a common empiricalchoice, albeit one which has had limited theoretical justification or analysis.In this paper, we show that state-based critics can introduce bias in thepolicy gradient estimates, potentially undermining the asymptotic guarantees ofthe algorithm. We also show that, even if the state-based critics do notintroduce any bias, they can still result in a larger gradient variance,contrary to the common intuition. Finally, we show the effects of the theoriesin practice by comparing different forms of centralized critics on a wide rangeof common benchmarks, and detail how various environmental properties arerelated to the effectiveness of different types of critics.

Quick Read (beta)

loading the full paper ...