Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Abstract

Offline reinforcement learning (RL) is a learning paradigm where an agentlearns from a fixed dataset of experience. However, learning solely from astatic dataset can limit the performance due to the lack of exploration. Toovercome it, offline-to-online RL combines offline pre-training with onlinefine-tuning, which enables the agent to further refine its policy byinteracting with the environment in real-time. Despite its benefits, existingoffline-to-online RL methods suffer from performance degradation and slowimprovement during the online phase. To tackle these challenges, we propose anovel framework called Ensemble-based Offline-to-Online (E2O) RL. By increasingthe number of Q-networks, we seamlessly bridge offline pre-training and onlinefine-tuning without degrading performance. Moreover, to expedite onlineperformance enhancement, we appropriately loosen the pessimism of Q-valueestimation and incorporate ensemble-based exploration mechanisms into ourframework. Experimental results demonstrate that E2O can substantially improvethe training stability, learning efficiency, and final performance of existingoffline RL methods during online fine-tuning on a range of locomotion andnavigation tasks, significantly outperforming existing offline-to-online RLmethods.

Quick Read (beta)

loading the full paper ...