Abstract
Multi-agent reinforcement learning typically employs a centralizedtraining-decentralized execution (CTDE) framework to alleviate thenon-stationarity in environment. However, the partial observability duringexecution may lead to cumulative gap errors gathered by agents, impairing thetraining of effective collaborative policies. To overcome this challenge, weintroduce the Double Distillation Network (DDN), which incorporates twodistillation modules aimed at enhancing robust coordination and facilitatingthe collaboration process under constrained information. The externaldistillation module uses a global guiding network and a local policy network,employing distillation to reconcile the gap between global training and localexecution. In addition, the internal distillation module introduces intrinsicrewards, drawn from state information, to enhance the exploration capabilitiesof agents. Extensive experiments demonstrate that DDN significantly improvesperformance across multiple scenarios.