Distributional Robustness and Regularization in Reinforcement Learning

  • 2020-07-14 06:01:03
  • Esther Derman, Shie Mannor
  • 0

Abstract

Distributionally Robust Optimization (DRO) has enabled to prove theequivalence between robustness and regularization in classification andregression, thus providing an analytical reason why regularization generalizeswell in statistical learning. Although DRO's extension to sequentialdecision-making overcomes $\textit{external uncertainty}$ through the robustMarkov Decision Process (MDP) setting, the resulting formulation is hard tosolve, especially on large domains. On the other hand, existing regularizationmethods in reinforcement learning only address $\textit{internal uncertainty}$due to stochasticity. Our study aims to facilitate robust reinforcementlearning by establishing a dual relation between robust MDPs andregularization. We introduce Wasserstein distributionally robust MDPs and provethat they hold out-of-sample performance guarantees. Then, we introduce a newregularizer for empirical value functions and show that it lower bounds theWasserstein distributionally robust value function. We extend the result tolinear value function approximation for large state spaces. Our approachprovides an alternative formulation of robustness with guaranteed finite-sampleperformance. Moreover, it suggests using regularization as a practical tool fordealing with $\textit{external uncertainty}$ in reinforcement learning methods.

 

Quick Read (beta)

loading the full paper ...