PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Abstract

Multi-agent reinforcement learning (MARL) has witnessed significant progresswith the development of value function factorization methods. It allowsoptimizing a joint action-value function through the maximization of factorizedper-agent utilities due to monotonicity. In this paper, we show that inpartially observable MARL problems, an agent's ordering over its own actionscould impose concurrent constraints (across different states) on therepresentable function class, causing significant estimation error duringtraining. We tackle this limitation and propose PAC, a new framework leveragingAssistive information generated from Counterfactual Predictions of optimaljoint action selection, which enable explicit assistance to value functionfactorization through a novel counterfactual loss. A variationalinference-based information encoding method is developed to collect and encodethe counterfactual predictions from an estimated baseline. To enabledecentralized execution, we also derive factorized per-agent policies inspiredby a maximum-entropy MARL framework. We evaluate the proposed PAC onmulti-agent predator-prey and a set of StarCraft II micromanagement tasks.Empirical results demonstrate improved results of PAC over state-of-the-artvalue-based and policy-based multi-agent reinforcement learning algorithms onall benchmarks.

Quick Read (beta)

loading the full paper ...