Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Abstract

Distributionally robust offline reinforcement learning (RL), which seeksrobust policy training against environment perturbation by modeling dynamicsuncertainty, calls for function approximations when facing large state-actionspaces. However, the consideration of dynamics uncertainty introduces essentialnonlinearity and computational burden, posing unique challenges for analyzingand practically employing function approximation. Focusing on a basic settingwhere the nominal model and perturbed models are linearly parameterized, wepropose minimax optimal and computationally efficient algorithms realizingfunction approximation and initiate the study on instance-dependentsuboptimality analysis in the context of robust offline RL. Our results uncoverthat function approximation in robust offline RL is essentially distinct fromand probably harder than that in standard offline RL. Our algorithms andtheoretical results crucially depend on a novel function approximationmechanism incorporating variance information, a new procedure of suboptimalityand estimation uncertainty decomposition, a quantification of the robust valuefunction shrinkage, and a meticulously designed family of hard instances, whichmight be of independent interest.

Quick Read (beta)

loading the full paper ...