Provably Efficient Reinforcement Learning with General Value Function Approximation

Abstract

Value function approximation has demonstrated phenomenal empirical success inreinforcement learning (RL). Nevertheless, despite a handful of recent progresson developing theory for RL with linear function approximation, theunderstanding of general function approximation schemes largely remainsmissing. In this paper, we establish the first provable efficiently RLalgorithm with general value function approximation. In particular, we showthat if the value functions admit an approximation with a function class$\mathcal{F}$, our algorithm achieves a regret bound of$\widetilde{O}(\mathrm{poly}(dH)\sqrt{T})$ where $d$ is a complexity measure of$\mathcal{F}$, $H$ is the planning horizon, and $T$ is the number interactionswith the environment. Our theory strictly generalizes recent progress on RLwith linear function approximation and does not make explicit assumptions onthe model of the environment. Moreover, our algorithm is model-free andprovides a framework to justify algorithms used in practice.

Quick Read (beta)

loading the full paper ...