Abstract
Multi-agent deep reinforcement learning (MARL) suffers from a lack ofcommonly-used evaluation tasks and criteria, making comparisons betweenapproaches difficult. In this work, we evaluate and compare three differentclasses of MARL algorithms (independent learning, centralised multi-agentpolicy gradient, and value decomposition) in a diverse range offully-cooperative multi-agent learning tasks. Our experiments can serve as areference for the expected performance of algorithms across different learningtasks. We also provide further insight about (1) when independent learningmight be surprisingly effective despite non-stationarity, (2) when centralisedtraining should (and shouldn't) be applied and (3) which benefits valuedecomposition can bring.