Multi-Agent Common Knowledge Reinforcement Learning

Abstract

Cooperative multi-agent reinforcement learning often requires decentralisedpolicies, which severely limit the agents' ability to coordinate theirbehaviour. In this paper, we show that common knowledge between agents allowsfor complex decentralised coordination. Common knowledge arises naturally in alarge number of decentralised cooperative multi-agent tasks, for example, whenagents can reconstruct parts of each others' observations. Since agents anindependently agree on their common knowledge, they can execute complexcoordinated policies that condition on this knowledge in a fully decentralisedfashion. We propose multi-agent common knowledge reinforcement learning(MACKRL), a novel stochastic actor-critic algorithm that learns a hierarchicalpolicy tree. Higher levels in the hierarchy coordinate groups of agents byconditioning on their common knowledge, or delegate to lower levels withsmaller subgroups but potentially richer common knowledge. The entire policytree can be executed in a fully decentralised fashion. As the lowest policytree level consists of independent policies for each agent, MACKRL reduces toindependently learnt decentralised policies as a special case. We demonstratethat our method can exploit common knowledge for superior performance oncomplex decentralised coordination tasks, including a stochastic matrix gameand challenging problems in StarCraft II unit micromanagement.

Quick Read (beta)

loading the full paper ...