Robust Reinforcement Learning with Distributional Risk-averse formulation

Abstract

Robust Reinforcement Learning tries to make predictions more robust tochanges in the dynamics or rewards of the system. This problem is particularlyimportant when the dynamics and rewards of the environment are estimated fromthe data. In this paper, we approximate the Robust Reinforcement Learningconstrained with a $\Phi$-divergence using an approximate Risk-Averseformulation. We show that the classical Reinforcement Learning formulation canbe robustified using standard deviation penalization of the objective. Twoalgorithms based on Distributional Reinforcement Learning, one for discrete andone for continuous action spaces are proposed and tested in a classical Gymenvironment to demonstrate the robustness of the algorithms.

Quick Read (beta)

loading the full paper ...