The Benefits of Model-Based Generalization in Reinforcement Learning

Abstract

Model-Based Reinforcement Learning (RL) is widely believed to have thepotential to improve sample efficiency by allowing an agent to synthesize largeamounts of imagined experience. Experience Replay (ER) can be considered asimple kind of model, which has proved extremely effective at improving thestability and efficiency of deep RL. In principle, a learned parametric modelcould improve on ER by generalizing from real experience to augment the datasetwith additional plausible experience. However, owing to the many design choicesinvolved in empirically successful algorithms, it can be very hard to establishwhere the benefits are actually coming from. Here, we provide theoretical andempirical insight into when, and how, we can expect data generated by a learnedmodel to be useful. First, we provide a general theorem motivating how learninga model as an intermediate step can narrow down the set of possible valuefunctions more than learning a value function directly from data using theBellman equation. Second, we provide an illustrative example showingempirically how a similar effect occurs in a more concrete setting with neuralnetwork function approximation. Finally, we provide extensive experimentsshowing the benefit of model-based learning for online RL in environments withcombinatorial complexity, but factored structure that allows a learned model togeneralize. In these experiments, we take care to control for other factors inorder to isolate, insofar as possible, the benefit of using experiencegenerated by a learned model relative to ER alone.

Quick Read (beta)

loading the full paper ...