QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Abstract

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under thesetting of Centralized Training with Decentralized Execution (CTDE), agentsobserve and interact with their environment locally and independently. Withlocal observation and random sampling, the randomness in rewards andobservations leads to randomness in long-term returns. Existing methods such asValue Decomposition Network (VDN) and QMIX estimate the mean value of long-termreturns while ignoring randomness. Our proposed model QR-MIX introducesquantile regression, modeling joint state-action values as a distribution,combining QMIX with Implicit Quantile Network (IQN). In addition, because themonotonicity in QMIX limits the expression of joint state-action valuedistribution and may lead to incorrect estimation results in nonmonotoniccases, we design a flexible loss function to replace the absolute weights foundin QMIX. Our methods enhance the expressiveness of our mixing network and aremore tolerant of randomness and nonmonotonicity. The experiments demonstratethat QR-MIX outperforms prior works in the StarCraft Multi-Agent Challenge(SMAC) environment.

Quick Read (beta)

loading the full paper ...