Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning

Abstract

Deep reinforcement learning methods have shown great performance on manychallenging cooperative multi-agent tasks. Two main promising researchdirections are multi-agent value function decomposition and multi-agent policygradients. In this paper, we propose a new decomposed multi-agent softactor-critic (mSAC) method, which incorporates the idea of the multi-agentvalue function decomposition and soft policy iteration framework effectivelyand is a combination of novel and existing techniques, including decomposed Qvalue network architecture, decentralized probabilistic policy, andcounterfactual advantage function (optional). Theoretically, mSAC supportsefficient off-policy learning and addresses credit assignment problem partiallyin both discrete and continuous action spaces. Tested on StarCraft IImicromanagement cooperative multiagent benchmark, we empirically investigatethe performance of mSAC against its variants and analyze the effects of thedifferent components. Experimental results demonstrate that mSAC significantlyoutperforms policy-based approach COMA, and achieves competitive results withSOTA value-based approach Qmix on most tasks in terms of asymptotic perfomancemetric. In addition, mSAC achieves pretty good results on large action spacetasks, such as 2c_vs_64zg and MMM2.

Quick Read (beta)

loading the full paper ...