Robust Reinforcement Learning using Adversarial Populations

Abstract

Reinforcement Learning (RL) is an effective tool for controller design butcan struggle with issues of robustness, failing catastrophically when theunderlying system dynamics are perturbed. The Robust RL formulation tacklesthis by adding worst-case adversarial noise to the dynamics and constructingthe noise distribution as the solution to a zero-sum minimax game. However,existing work on learning solutions to the Robust RL formulation has primarilyfocused on training a single RL agent against a single adversary. In this work,we demonstrate that using a single adversary does not consistently yieldrobustness to dynamics variations under standard parametrizations of theadversary; the resulting policy is highly exploitable by new adversaries. Wepropose a population-based augmentation to the Robust RL formulation in whichwe randomly initialize a population of adversaries and sample from thepopulation uniformly during training. We empirically validate across roboticsbenchmarks that the use of an adversarial population results in a more robustpolicy that also improves out-of-distribution generalization. Finally, wedemonstrate that this approach provides comparable robustness andgeneralization as domain randomization on these benchmarks while avoiding aubiquitous domain randomization failure mode.

Quick Read (beta)

loading the full paper ...