Abstract
We study the problem of cooperative multi-agent reinforcement learning with asingle joint reward signal. This class of learning problems is difficultbecause of the often large combined action and observation spaces. In the fullycentralized and decentralized approaches, we find the problem of spuriousrewards and a phenomenon we call the "lazy agent" problem, which arises due topartial observability. We address these problems by training individual agentswith a novel value decomposition network architecture, which learns todecompose the team value function into agent-wise value functions. We performan experimental evaluation across a range of partially-observable multi-agentdomains and show that learning such value-decompositions leads to superiorresults, in particular when combined with weight sharing, role information andinformation channels.