Abstract
Model-based reinforcement learning has attracted wide attention due to itssuperior sample efficiency. Despite its impressive success so far, it is stillunclear how to appropriately schedule the important hyperparameters to achieveadequate performance, such as the real data ratio for policy optimization inDyna-style model-based algorithms. In this paper, we first theoreticallyanalyze the role of real data in policy training, which suggests that graduallyincreasing the ratio of real data yields better performance. Inspired by theanalysis, we propose a framework named AutoMBPO to automatically schedule thereal data ratio as well as other hyperparameters in training model-based policyoptimization (MBPO) algorithm, a representative running case of model-basedmethods. On several continuous control tasks, the MBPO instance trained withhyperparameters scheduled by AutoMBPO can significantly surpass the originalone, and the real data ratio schedule found by AutoMBPO shows consistency withour theoretical analysis.