Continual Learning Using World Models for Pseudo-Rehearsal

Abstract

The utility of learning a dynamics/world model of the environment inreinforcement learning has been shown in a many ways. When using neuralnetworks, however, these models suffer catastrophic forgetting when learned ina lifelong or continual fashion. Current solutions to the continual learningproblem require experience to be segmented and labeled as discrete tasks,however, in continuous experience it is generally unclear what a sufficientsegmentation of tasks would be. Here we propose a method to continually learnthese internal world models through the interleaving of internally generatedepisodes of past experiences (i.e., pseudo-rehearsal). We show this method cansequentially learn unsupervised temporal prediction, without task labels, in adisparate set of Atari games. Empirically, this interleaving of the internallygenerated rollouts with the external environment's observations leads to aconsistent reduction in temporal prediction loss compared to non-interleavedlearning and is preserved over repeated random exposures to various tasks.Similarly, using a network distillation approach, we show that modern policygradient based reinforcement learning algorithms can use this internal model tocontinually learn to optimize reward based on the world model's representationof the environment.

Quick Read (beta)

loading the full paper ...