Abstract
Model-free reinforcement learning (RL) is capable of learning controlpolicies for high-dimensional, complex robotic tasks, but tends to bedata-inefficient. Model-based RL and optimal control have been proven to bemuch more data-efficient if an accurate model of the system and environment isknown, but can be difficult to scale to expressive models for high-dimensionalproblems. In this paper, we propose a novel approach to alleviate datainefficiency of model-free RL by warm-starting the learning process using alower-dimensional model-based solutions. Particularly, we propose a baselinefunction that is initialized via supervision from a low-dimensional valuefunction. Such a lower-dimensional value function can be obtained by applyingmodel-based techniques on a low-dimensional problem featuring a knownapproximate system model. Therefore, our approach exploits the model priorsfrom a simplified problem space implicitly and avoids the direct use ofhigh-dimensional, expressive models. We demonstrate our approach on tworepresentative robotic learning tasks and observe significant improvement inperformance and efficiency, and analyze our method empirically with a thirdtask.