Provably Efficient Reinforcement Learning with Linear Function Approximation

Abstract

Modern Reinforcement Learning (RL) is commonly applied to practical problemswith an enormous number of states, where function approximation must bedeployed to approximate either the value function or the policy. Theintroduction of function approximation raises a fundamental set of challengesinvolving computational and statistical efficiency, especially given the needto manage the exploration/exploitation tradeoff. As a result, a core RLquestion remains open: how can we design provably efficient RL algorithms thatincorporate function approximation? This question persists even in a basicsetting with linear dynamics and linear rewards, for which only linear functionapproximation is needed. This paper presents the first provable RL algorithm with both polynomialruntime and polynomial sample complexity in this linear setting, withoutrequiring a "simulator" or additional assumptions. Concretely, we prove that anoptimistic modification of Least-Squares Value Iteration (LSVI)---a classicalalgorithm frequently studied in the linear setting---achieves$\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret, where $d$ is the ambientdimension of feature space, $H$ is the length of each episode, and $T$ is thetotal number of steps. Importantly, such regret is independent of the number ofstates and actions.

Quick Read (beta)

loading the full paper ...