Abstract
Traditional distributed deep reinforcement learning (RL) commonly relies onexchanging the experience replay memory (RM) of each agent. Since the RMcontains all state observations and action policy history, it may incur hugecommunication overhead while violating the privacy of each agent.Alternatively, this article presents a communication-efficient andprivacy-preserving distributed RL framework, coined federated reinforcementdistillation (FRD). In FRD, each agent exchanges its proxy experience replaymemory (ProxRM), in which policies are locally averaged with respect to proxystates clustering actual states. To provide FRD design insights, we presentablation studies on the impact of ProxRM structures, neural networkarchitectures, and communication intervals. Furthermore, we propose an improvedversion of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM isinterpolated using the mixup data augmentation algorithm. Simulations in aCartpole environment validate the effectiveness of MixFRD in reducing thevariance of mission completion time and communication cost, compared to thebenchmark schemes, vanilla FRD, federated reinforcement learning (FRL), andpolicy distillation (PD).