Abstract
We consider the problem of federated offline reinforcement learning (RL), ascenario under which distributed learning agents must collaboratively learn ahigh-quality control policy only using small pre-collected datasets generatedaccording to different unknown behavior policies. Na\"{i}vely combining astandard offline RL approach with a standard federated learning approach tosolve this problem can lead to poorly performing policies. In response, wedevelop the Federated Ensemble-Directed Offline Reinforcement LearningAlgorithm (FEDORA), which distills the collective wisdom of the clients usingan ensemble learning approach. We develop the FEDORA codebase to utilizedistributed compute resources on a federated learning platform. We show thatFEDORA significantly outperforms other approaches, including offline RL overthe combined data pool, in various complex continuous control environments andreal-world datasets. Finally, we demonstrate the performance of FEDORA in thereal-world on a mobile robot. We provide our code and a video of ourexperiments at \url{https://github.com/DesikRengarajan/FEDORA}.