Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

Abstract

Learning models of the environment from pure interaction is often consideredan essential component of building lifelong reinforcement learning agents.However, the common practice in model-based reinforcement learning is to learnmodels that model every aspect of the agent's environment, regardless ofwhether they are important in coming up with optimal decisions or not. In thispaper, we argue that such models are not particularly well-suited forperforming scalable and robust planning in lifelong reinforcement learningscenarios and we propose new kinds of models that only model the relevantaspects of the environment, which we call "minimal value-equivalent partialmodels". After providing a formal definition for these models, we providetheoretical results demonstrating the scalability advantages of performingplanning with such models and then perform experiments to empiricallyillustrate our theoretical results. Then, we provide some useful heuristics onhow to learn these kinds of models with deep learning architectures andempirically demonstrate that models learned in such a way can allow forperforming planning that is robust to distribution shifts and compounding modelerrors. Overall, both our theoretical and empirical results suggest thatminimal value-equivalent partial models can provide significant benefits toperforming scalable and robust planning in lifelong reinforcement learningscenarios.

Quick Read (beta)

loading the full paper ...