Abstract
Deep multi-agent reinforcement learning (MARL) holds the promise ofautomating many real-world cooperative robotic manipulation and transportationtasks. Nevertheless, decentralised cooperative robotic control has receivedless attention from the deep reinforcement learning community, as compared tosingle-agent robotics and multi-agent games with discrete actions. To addressthis gap, this paper introduces Multi-Agent Mujoco, an easily extensiblemulti-agent benchmark suite for robotic control in continuous action spaces.The benchmark tasks are diverse and admit easily configurable partiallyobservable settings. Inspired by the success of single-agent continuousvalue-based algorithms in robotic control, we also introduce COMIX, a novelextension to a common discrete action multi-agent $Q$-learning algorithm. Weshow that COMIX significantly outperforms state-of-the-art MADDPG on apartially observable variant of a popular particle environment and matches orsurpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite andmethod, we can now pose an interesting question: what is the key to performancein such settings, the use of value-based methods instead of policy gradients,or the factorisation of the joint $Q$-function? To answer this question, wepropose a second new method, FacMADDPG, which factors MADDPG's critic.Experimental results on Multi-Agent Mujoco suggest that factorisation is thekey to performance.