Abstract
We study a Federated Reinforcement Learning (FedRL) problem with constraintheterogeneity. In our setting, we aim to solve a reinforcement learning problemwith multiple constraints while $N$ training agents are located in $N$different environments with limited access to the constraint signals and theyare expected to collaboratively learn a policy satisfying all constraintsignals. Such learning problems are prevalent in scenarios of Large LanguageModel (LLM) fine-tuning and healthcare applications. To solve the problem, wepropose federated primal-dual policy optimization methods based on traditionalpolicy gradient methods. Specifically, we introduce $N$ local Lagrangefunctions for agents to perform local policy updates, and these agents are thenscheduled to periodically communicate on their local policies. Taking naturalpolicy gradient (NPG) and proximal policy optimization (PPO) as policyoptimization methods, we mainly focus on two instances of our algorithms, ie,{FedNPG} and {FedPPO}. We show that FedNPG achieves global convergence with an$\tilde{O}(1/\sqrt{T})$ rate, and FedPPO efficiently solves complicatedlearning tasks with the use of deep neural networks.