Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Abstract

Trial-and-error based reinforcement learning (RL) has seen rapid advancementsin recent times, especially with the advent of deep neural networks. However,the majority of autonomous RL algorithms require a large number of interactionswith the environment. A large number of interactions may be impractical in manyreal-world applications, such as robotics, and many practical systems have toobey limitations in the form of state space or control constraints. To reducethe number of system interactions while simultaneously handling constraints, wepropose a model-based RL framework based on probabilistic Model PredictiveControl (MPC). In particular, we propose to learn a probabilistic transitionmodel using Gaussian Processes (GPs) to incorporate model uncertainty intolong-term predictions, thereby, reducing the impact of model errors. We thenuse MPC to find a control sequence that minimises the expected long-term cost.We provide theoretical guarantees for first-order optimality in the GP-basedtransition models with deterministic approximate inference for long-termplanning. We demonstrate that our approach does not only achievestate-of-the-art data efficiency, but also is a principled way for RL inconstrained environments.

Quick Read (beta)

loading the full paper ...