Abstract
Federated reinforcement learning (FedRL) enables multiple agents tocollaboratively learn a policy without sharing their local trajectoriescollected during agent-environment interactions. However, in practice, theenvironments faced by different agents are often heterogeneous, leading to poorperformance by the single policy learned by existing FedRL algorithms onindividual agents. In this paper, we take a further step and introduce a\emph{personalized} FedRL framework (PFedRL) by taking advantage of possiblyshared common structure among agents in heterogeneous environments.Specifically, we develop a class of PFedRL algorithms named PFedRL-Rep thatlearns (1) a shared feature representation collaboratively among all agents,and (2) an agent-specific weight vector personalized to its local environment.We analyze the convergence of PFedTD-Rep, a particular instance of theframework with temporal difference (TD) learning and linear representations. Tothe best of our knowledge, we are the first to prove a linear convergencespeedup with respect to the number of agents in the PFedRL setting. To achievethis, we show that PFedTD-Rep is an example of the federated two-timescalestochastic approximation with Markovian noise. Experimental results demonstratethat PFedTD-Rep, along with an extension to the control setting based on deepQ-networks (DQN), not only improve learning in heterogeneous settings, but alsoprovide better generalization to new environments.