Abstract
This paper considers a distributed reinforcement learning problem in which anetwork of multiple agents aim to cooperatively maximize the globally averagedreturn through communication with only local neighbors. A randomizedcommunication-efficient multi-agent actor-critic algorithm is proposed forpossibly unidirectional communication relationships depicted by a directedgraph. It is shown that the algorithm can solve the problem for stronglyconnected graphs by allowing each agent to transmit only two scalar-valuedvariables at one time.
Quick Read (beta)
loading the full paper ...