Abstract
Reinforcement learning (RL) is recognized as lacking generalization androbustness under environmental perturbations, which excessively restricts itsapplication for real-world robotics. Prior work claimed that addingregularization to the value function is equivalent to learning a robust policywith uncertain transitions. Although the regularization-robustnesstransformation is appealing for its simplicity and efficiency, it is stilllacking in continuous control tasks. In this paper, we propose a newregularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer(USR), by formulating the uncertainty set on the parameter space of thetransition function. In particular, USR is flexible enough to be plugged intoany existing RL framework. To deal with unknown uncertainty sets, we furtherpropose a novel adversarial approach to generate them based on the valuefunction. We evaluate USR on the Real-world Reinforcement Learning (RWRL)benchmark, demonstrating improvements in the robust performance for perturbedtesting environments.