Provable Benefit of Multitask Representation Learning in Reinforcement Learning

Abstract

As representation learning becomes a powerful technique to reduce samplecomplexity in reinforcement learning (RL) in practice, theoreticalunderstanding of its advantage is still limited. In this paper, wetheoretically characterize the benefit of representation learning under thelow-rank Markov decision process (MDP) model. We first study multitask low-rankRL (as upstream training), where all tasks share a common representation, andpropose a new multitask reward-free algorithm called REFUEL. REFUEL learns boththe transition kernel and the near-optimal policy for each task, and outputs awell-learned representation for downstream tasks. Our result demonstrates thatmultitask representation learning is provably more sample-efficient thanlearning each task individually, as long as the total number of tasks is abovea certain threshold. We then study the downstream RL in both online and offlinesettings, where the agent is assigned with a new task sharing the samerepresentation as the upstream tasks. For both online and offline settings, wedevelop a sample-efficient algorithm, and show that it finds a near-optimalpolicy with the suboptimality gap bounded by the sum of the estimation error ofthe learned representation in upstream and a vanishing term as the number ofdownstream samples becomes large. Our downstream results of online and offlineRL further capture the benefit of employing the learned representation fromupstream as opposed to learning the representation of the low-rank modeldirectly. To the best of our knowledge, this is the first theoretical studythat characterizes the benefit of representation learning in exploration-basedreward-free multitask RL for both upstream and downstream tasks.

Quick Read (beta)

loading the full paper ...