POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Abstract

Value function factorization methods are commonly used in cooperativemulti-agent reinforcement learning, with QMIX receiving significant attention.Many QMIX-based methods introduce monotonicity constraints between the jointaction value and individual action values to achieve decentralized execution.However, such constraints limit the representation capacity of valuefactorization, restricting the joint action values it can represent andhindering the learning of the optimal policy. To address this challenge, wepropose the Potentially Optimal Joint Actions Weighted QMIX (POWQMIX)algorithm, which recognizes the potentially optimal joint actions and assigns hQMIX (POWQMIX) algorithm, which recognizes the potentially optimal jointactions and assigns higher weights to the corresponding losses of these jointactions during training. We theoretically prove that with such a weightedtraining approach the optimal policy is guaranteed to be recovered. Experimentsin matrix games, difficulty-enhanced predator-prey, and StarCraft IIMulti-Agent Challenge environments demonstrate that our algorithm outperformsthe state-of-the-art value-based multi-agent reinforcement learning methods.

Quick Read (beta)

loading the full paper ...