When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

  • 2023-01-11 14:53:50
  • Haoyi Niu, Shubham Sharma, Yiwen Qiu, Ming Li, Guyue Zhou, Jianming Hu, Xianyuan Zhan
  • 0


Learning effective reinforcement learning (RL) policies to solve real-worldcomplex tasks can be quite challenging without a high-fidelity simulationenvironment. In most cases, we are only given imperfect simulators withsimplified dynamics, which inevitably lead to severe sim-to-real gaps in RLpolicy learning. The recently emerged field of offline RL provides anotherpossibility to learn policies directly from pre-collected historical data.However, to achieve reasonable performance, existing offline RL algorithms needimpractically large offline data with sufficient state-action space coveragefor training. This brings up a new question: is it possible to combine learningfrom limited real data in offline RL and unrestricted exploration throughimperfect simulators in online RL to address the drawbacks of both approaches?In this study, we propose the Dynamics-Aware Hybrid Offline-and-OnlineReinforcement Learning (H2O) framework to provide an affirmative answer to thisquestion. H2O introduces a dynamics-aware policy evaluation scheme, whichadaptively penalizes the Q function learning on simulated state-action pairswith large dynamics gaps, while also simultaneously allowing learning from afixed real-world dataset. Through extensive simulation and real-world tasks, aswell as theoretical analysis, we demonstrate the superior performance of H2Oagainst other cross-domain online and offline RL algorithms. H2O provides abrand new hybrid offline-and-online RL paradigm, which can potentially shedlight on future RL algorithm design for solving practical real-world tasks.


Quick Read (beta)

loading the full paper ...