M$^3$RL: Mind-aware Multi-agent Management Reinforcement Learning

  • 2019-03-07 06:02:40
  • Tianmin Shu, Yuandong Tian
Most of the prior work on multi-agent reinforcement learning (MARL) achievesoptimal collaboration by directly controlling the agents to maximize a commonreward. In this paper, we aim to address this from a different angle. Inparticular, we consider scenarios where there are self-interested agents (i.e.,worker agents) which have their own minds (preferences, intentions, skills,etc.) and can not be dictated to perform tasks they do not wish to do. Forachieving optimal coordination among these agents, we train a super agent(i.e., the manager) to manage them by first inferring their minds based on bothcurrent and past observations and then initiating contracts to assign suitabletasks to workers and promise to reward them with corresponding bonuses so thatthey will agree to work together. The objective of the manager is maximizingthe overall productivity as well as minimizing payments made to the workers forad-hoc worker teaming. To train the manager, we propose Mind-aware Multi-agentManagement Reinforcement Learning (M^3RL), which consists of agent modeling andpolicy learning. We have evaluated our approach in two environments, ResourceCollection and Crafting, to simulate multi-agent management problems withvarious task settings and multiple designs for the worker agents. Theexperimental results have validated the effectiveness of our approach inmodeling worker agents' minds online, and in achieving optimal ad-hoc teamingwith good generalization and fast adaptation.


