Abstract
Applying reinforcement learning (RL) to real-world applications requiresaddressing a trade-off between asymptotic performance, sample efficiency, andinference time. In this work, we demonstrate how to address this triplechallenge by leveraging partial physical knowledge about the system dynamics.Our approach involves learning a physics-informed model to boost sampleefficiency and generating imaginary trajectories from this model to learn amodel-free policy and Q-function. Furthermore, we propose a hybrid planningstrategy, combining the learned policy and Q-function with the learned model toenhance time efficiency in planning. Through practical demonstrations, weillustrate that our method improves the compromise between sample efficiency,time efficiency, and performance over state-of-the-art methods.