Abstract
Federated Reinforcement Learning (FedRL) improves sample efficiency whilepreserving privacy; however, most existing studies assume homogeneous agents,limiting its applicability in real-world scenarios. This paper investigatesFedRL in black-box settings with heterogeneous agents, where each agent employsdistinct policy networks and training configurations without disclosing theirinternal details. Knowledge Distillation (KD) is a promising method forfacilitating knowledge sharing among heterogeneous models, but it faceschallenges related to the scarcity of public datasets and limitations inknowledge representation when applied to FedRL. To address these challenges, wepropose Federated Heterogeneous Policy Distillation (FedHPD), which solves theproblem of heterogeneous FedRL by utilizing action probability distributions asa medium for knowledge sharing. We provide a theoretical analysis of FedHPD'sconvergence under standard assumptions. Extensive experiments corroborate thatFedHPD shows significant improvements across various reinforcement learningbenchmark tasks, further validating our theoretical findings. Moreover,additional experiments demonstrate that FedHPD operates effectively without theneed for an elaborate selection of public datasets.