Abstract
Federated Reinforcement Learning (FRL) allows multiple agents tocollaboratively build a decision making policy without sharing rawtrajectories. However, if a small fraction of these agents are adversarial, itcan lead to catastrophic results. We propose a policy gradient based approachthat is robust to adversarial agents which can send arbitrary values to theserver. Under this setting, our results form the first global convergenceguarantees with general parametrization. These results demonstrate resiliencewith adversaries, while achieving optimal sample complexity of order$\tilde{\mathcal{O}}\left( \frac{1}{N\epsilon^2} \left( 1+\frac{f^2}{N}\right)\right)$, where $N$ is the total number of agents and$f<N/2$ is the number of adversarial agents.