Efficient Inference and Exploration for Reinforcement Learning

Abstract

Despite an ever growing literature on reinforcement learning algorithms andapplications, much less is known about their statistical inference. In thispaper, we investigate the large sample behaviors of the Q-value estimates withclosed-form characterizations of the asymptotic variances. This allows us toefficiently construct confidence regions for Q-value and optimal valuefunctions, and to develop policies to minimize their estimation errors. Thisalso leads to a policy exploration strategy that relies on estimating therelative discrepancies among the Q estimates. Numerical experiments showsuperior performances of our exploration strategy than other benchmarkapproaches.

Quick Read (beta)

loading the full paper ...