Wasserstein Robust Reinforcement Learning

Abstract

Reinforcement learning algorithms, though successful, tend to over-fit totraining environments hampering their application to the real-world. This paperproposes WR$^{2}$L; a robust reinforcement learning algorithm with significantrobust performance on low and high-dimensional control tasks. Our methodformalises robust reinforcement learning as a novel min-max game with aWasserstein constraint for a correct and convergent solver. Apart from theformulation, we also propose an efficient and scalable solver following a novelzero-order optimisation method that we believe can be useful to numericaloptimisation in general. We contribute both theoretically and empirically. Onthe theory side, we prove that WR$^{2}$L converges to a stationary point in thegeneral setting of continuous state and action spaces. Empirically, wedemonstrate significant gains compared to standard and robust state-of-the-artalgorithms on high-dimensional MuJuCo environments.

Quick Read (beta)

loading the full paper ...