Objective Mismatch in Model-based Reinforcement Learning

Abstract

Model-based reinforcement learning (MBRL) has been shown to be a powerfulframework for data-efficiently learning control of continuous tasks. Recentwork in MBRL has mostly focused on using more advanced function approximatorsand planning schemes, with little development of the general framework. In thispaper, we identify a fundamental issue of the standard MBRL framework -- whatwe call the objective mismatch issue. Objective mismatch arises when oneobjective is optimized in the hope that a second, often uncorrelated, metricwill also be optimized. In the context of MBRL, we characterize the objectivemismatch between training the forward dynamics model w.r.t.~the likelihood ofthe one-step ahead prediction, and the overall goal of improving performance ona downstream control task. For example, this issue can emerge with therealization that dynamics models effective for a specific task do notnecessarily need to be globally accurate, and vice versa globally accuratemodels might not be sufficiently accurate locally to obtain good controlperformance on a specific task. In our experiments, we study this objectivemismatch issue and demonstrate that the likelihood of one-step aheadpredictions is not always correlated with control performance. This observationhighlights a critical limitation in the MBRL framework which will requirefurther research to be fully understood and addressed. We propose an initialmethod to mitigate the mismatch issue by re-weighting dynamics model training.Building on it, we conclude with a discussion about other potential directionsof research for addressing this issue.

Quick Read (beta)

loading the full paper ...