Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

Abstract

While model-based reinforcement learning has empirically been shown tosignificantly reduce the sample complexity that hinders model-free RL, thetheoretical understanding of such methods has been rather limited. In thispaper, we introduce a novel algorithmic framework for designing and analyzingmodel-based RL algorithms with theoretical guarantees, and a practicalalgorithm Optimistic Lower Bounds Optimization (OLBO). In particular, we derivea theoretical guarantee of monotone improvement for model-based RL with ourframework. We iteratively build a lower bound of the expected reward based onthe estimated dynamical model and sample trajectories, and maximize it jointlyover the policy and the model. Assuming the optimization in each iterationsucceeds, the expected reward is guaranteed to improve. The framework alsoincorporates an optimism-driven perspective, and reveals the intrinsic measurefor the model prediction error. Preliminary simulations demonstrate that ourapproach outperforms the standard baselines on continuous control benchmarktasks.

Quick Read (beta)

loading the full paper ...