Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

Abstract

We study reinforcement learning (RL) with linear function approximation underthe adaptivity constraint. We consider two popular limited adaptivity models:batch learning model and rare policy switch model, and propose two efficientonline RL algorithms for linear Markov decision processes. In specific, for thebatch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tildeO(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the featuremapping, $H$ is the episode length, $T$ is the number of interactions and $B$is the number of batches. Our result suggests that it suffices to use only$\sqrt{T/dH}$ batches to obtain $\tilde O(\sqrt{d^3H^3T})$ regret. For the rarepolicy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an$\tilde O(\sqrt{d^3H^3T[1+T/(dH)]^{dH/B}})$ regret, which implies that $dH\logT$ policy switches suffice to obtain the $\tilde O(\sqrt{d^3H^3T})$ regret. Ouralgorithms achieve the same regret as the LSVI-UCB algorithm (Jin et al.,2019), yet with a substantially smaller amount of adaptivity.

Quick Read (beta)

loading the full paper ...