Bayesian Design Principles for Offline-to-Online Reinforcement Learning

  • 2024-05-31 17:31:07
  • Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang
  • 0


Offline reinforcement learning (RL) is crucial for real-world applicationswhere exploration can be costly or unsafe. However, offline learned policiesare often suboptimal, and further online fine-tuning is required. In thispaper, we tackle the fundamental dilemma of offline-to-online fine-tuning: ifthe agent remains pessimistic, it may fail to learn a better policy, while ifit becomes optimistic directly, performance may suffer from a sudden drop. Weshow that Bayesian design principles are crucial in solving such a dilemma.Instead of adopting optimistic or pessimistic policies, the agent should act ina way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop whilestill being guaranteed to find the optimal policy. Based on our theoreticalfindings, we introduce a novel algorithm that outperforms existing methods onvarious benchmarks, demonstrating the efficacy of our approach. Overall, theproposed approach provides a new perspective on offline-to-online RL that hasthe potential to enable more effective learning from offline data.


Quick Read (beta)

loading the full paper ...