Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Abstract

Accurately predicting the dynamics of robotic systems is crucial formodel-based control and reinforcement learning. The most common way to estimatedynamics is by fitting a one-step ahead prediction model and using it torecursively propagate the predicted state distribution over long horizons.Unfortunately, this approach is known to compound even small prediction errors,making long-term predictions inaccurate. In this paper, we propose a newparametrization to supervised learning on state-action data to stably predictat longer horizons -- that we call a trajectory-based model. Thistrajectory-based model takes an initial state, a future time index, and controlparameters as inputs, and directly predicts the state at the future time index.Experimental results in simulated and real-world robotic tasks show thattrajectory-based models yield significantly more accurate long termpredictions, improved sample efficiency, and the ability to predict taskreward. With these improved prediction properties, we conclude with ademonstration of methods for using the trajectory-based model for control.

Quick Read (beta)

loading the full paper ...