Robust Reinforcement Learning using Offline Data

Abstract

The goal of robust reinforcement learning (RL) is to learn a policy that isrobust against the uncertainty in model parameters. Parameter uncertaintycommonly occurs in many real-world RL applications due to simulator modelingerrors, changes in the real-world system dynamics over time, and adversarialdisturbances. Robust RL is typically formulated as a max-min problem, where theobjective is to learn the policy that maximizes the value against the worstpossible models that lie in an uncertainty set. In this work, we propose arobust RL algorithm called Robust Fitted Q-Iteration (RFQI), which uses only anoffline dataset to learn the optimal robust policy. Robust RL with offline datais significantly more challenging than its non-robust counterpart because ofthe minimization over all models present in the robust Bellman operator. Thisposes challenges in offline data collection, optimization over the models, andunbiased estimation. In this work, we propose a systematic approach to overcomethese challenges, resulting in our RFQI algorithm. We prove that RFQI learns anear-optimal robust policy under standard assumptions and demonstrate itssuperior performance on standard benchmark problems.

Quick Read (beta)

loading the full paper ...