Efficient Inference and Exploration for Reinforcement Learning

  • 2019-11-04 20:25:09
  • YI Zhu, Jing Dong, Henry Lam
  • 0


Despite an ever growing literature on reinforcement learning algorithms andapplications, much less is known about their statistical inference. In thispaper, we investigate the large sample behaviors of the Q-value estimates withclosed-form characterizations of the asymptotic variances. This allows us toefficiently construct confidence regions for Q-value and optimal valuefunctions, and to develop policies to minimize their estimation errors. Thisalso leads to a policy exploration strategy that relies on estimating therelative discrepancies among the Q estimates. Numerical experiments showsuperior performances of our exploration strategy than other benchmarkapproaches.


Quick Read (beta)

This feature is not avaialbe for this paper.