Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Abstract

Reinforcement learning has emerged as one of the prominent topics attractingattention in modern statistical learning, with policy evaluation being a keycomponent. Unlike the traditional machine learning literature on this topic,our work emphasizes statistical inference for the model parameters and valuefunctions of reinforcement learning algorithms. While most existing analysesassume random rewards to follow standard distributions, we embrace the conceptof robust statistics in reinforcement learning by simultaneously addressingissues of outlier contamination and heavy-tailed rewards within a unifiedframework. In this paper, we develop a fully online robust policy evaluationprocedure, and establish the Bahadur-type representation of our estimator.Furthermore, we develop an online procedure to efficiently conduct statisticalinference based on the asymptotic distribution. This paper connects robuststatistics and statistical inference in reinforcement learning, offering a moreversatile and reliable approach to online policy evaluation. Finally, wevalidate the efficacy of our algorithm through numerical experiments conductedin simulations and real-world reinforcement learning experiments.

Quick Read (beta)

loading the full paper ...