Abstract
Many real-world reinforcement learning tasks require multiple agents to makesequential decisions under the agents' interaction, where well-coordinatedactions among the agents are crucial to achieve the target goal better at thesetasks. One way to accelerate the coordination effect is to enable multipleagents to communicate with each other in a distributed manner and behave as agroup. In this paper, we study a practical scenario when (i) the communicationbandwidth is limited and (ii) the agents share the communication medium so thatonly a restricted number of agents are able to simultaneously use the medium,as in the state-of-the-art wireless networking standards. This calls for acertain form of communication scheduling. In that regard, we propose amulti-agent deep reinforcement learning framework, called SchedNet, in whichagents learn how to schedule themselves, how to encode the messages, and how toselect actions based on received messages. SchedNet is capable of decidingwhich agents should be entitled to broadcasting their (encoded) messages, bylearning the importance of each agent's partially observed information. Weevaluate SchedNet against multiple baselines under two different applications,namely, cooperative communication and navigation, and predator-prey. Ourexperiments show a non-negligible performance gap between SchedNet and othermechanisms such as the ones without communication and with vanilla schedulingmethods, e.g., round robin, ranging from 32% to 43%.