A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning

Abstract

In model-based reinforcement learning, most algorithms rely on simulatingtrajectories from one-step models of the dynamics learned on data. A criticalchallenge of this approach is the compounding of one-step prediction errors asthe length of the trajectory grows. In this paper we tackle this issue by usinga multi-step objective to train one-step models. Our objective is a weightedsum of the mean squared error (MSE) loss at various future horizons. We findthat this new loss is particularly useful when the data is noisy (additiveGaussian noise in the observations), which is often the case in real-lifeenvironments. To support the multi-step loss, first we study its properties intwo tractable cases: i) uni-dimensional linear system, and ii) two-parameternon-linear system. Second, we show in a variety of tasks (environments ordatasets) that the models learned with this loss achieve a significantimprovement in terms of the averaged R2-score on future prediction horizons.Finally, in the pure batch reinforcement learning setting, we demonstrate thatone-step models serve as strong baselines when dynamics are deterministic,while multi-step models would be more advantageous in the presence of noise,highlighting the potential of our approach in real-world applications.

Quick Read (beta)

loading the full paper ...