Accelerating Online Reinforcement Learning with Offline Datasets

Abstract

Reinforcement learning provides an appealing formalism for learning controlpolicies from experience. However, the classic active formulation ofreinforcement learning necessitates a lengthy active exploration process foreach behavior, making it difficult to apply in real-world settings. If we caninstead allow reinforcement learning to effectively use previously collecteddata to aid the online learning process, where the data could be expertdemonstrations or more generally any prior experience, we could makereinforcement learning a substantially more practical tool. While a number ofrecent methods have sought to learn offline from previously collected data, itremains exceptionally difficult to train a policy with offline data and improveit further with online reinforcement learning. In this paper we systematicallyanalyze why this problem is so challenging, and propose a novel algorithm thatcombines sample-efficient dynamic programming with maximum likelihood policyupdates, providing a simple and effective framework that is able to leveragelarge amounts of offline data and then quickly perform online fine-tuning ofreinforcement learning policies. We show that our method enables rapid learningof skills with a combination of prior demonstration data and online experienceacross a suite of difficult dexterous manipulation and benchmark tasks.

Quick Read (beta)

loading the full paper ...