MOORe: Model-based Offline-to-Online Reinforcement Learning

Abstract

With the success of offline reinforcement learning (RL), offline trained RLpolicies have the potential to be further improved when deployed online. Asmooth transfer of the policy matters in safe real-world deployment. Besides,fast adaptation of the policy plays a vital role in practical onlineperformance improvement. To tackle these challenges, we propose a simple yetefficient algorithm, Model-based Offline-to-Online Reinforcement learning(MOORe), which employs a prioritized sampling scheme that can dynamicallyadjust the offline and online data for smooth and efficient online adaptationof the policy. We provide a theoretical foundation for our algorithms design.Experiment results on the D4RL benchmark show that our algorithm smoothlytransfers from offline to online stages while enabling sample-efficient onlineadaption, and also significantly outperforms existing methods.

Quick Read (beta)

loading the full paper ...