Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

  • 2024-07-31 01:43:51
  • Taehyun Cho, Seungyub Han, Kyungjae Lee, Seokhun Ju, Dohyeong Kim, Jungwoo Lee
  • 0

Abstract

Distributional reinforcement learning improves performance by effectivelycapturing environmental stochasticity, but a comprehensive theoreticalunderstanding of its effectiveness remains elusive. In this paper, we present aregret analysis for distributional reinforcement learning with general valuefunction approximation in a finite episodic Markov decision process setting. Wefirst introduce a key notion of Bellman unbiasedness for a tractable andexactly learnable update via statistical functional dynamic programming. Ourtheoretical results show that approximating the infinite-dimensional returndistribution with a finite number of moment functionals is the only method tolearn the statistical information unbiasedly, including nonlinear statisticalfunctionals. Second, we propose a provably efficient algorithm,$\texttt{SF-LSVI}$, achieving a regret bound of $\tilde{O}(d_EH^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number ofepisodes, and $d_E$ is the eluder dimension of a function class.

 

Quick Read (beta)

loading the full paper ...