Abstract
In model-based reinforcement learning, the agent interleaves between modellearning and planning. These two components are inextricably intertwined. Ifthe model is not able to provide sensible long-term prediction, the executedplanner would exploit model flaws, which can yield catastrophic failures. Thispaper focuses on building a model that reasons about the long-term future anddemonstrates how to use this for efficient planning and exploration. To thisend, we build a latent-variable autoregressive model by leveraging recent ideasin variational inference. We argue that forcing latent variables to carryfuture information through an auxiliary task substantially improves long-termpredictions. Moreover, by planning in the latent space, the planner's solutionis ensured to be within regions where the model is valid. An explorationstrategy can be devised by searching for unlikely trajectories under the model.Our method achieves higher reward faster compared to baselines on a variety oftasks and environments in both the imitation learning and model-basedreinforcement learning settings.