Augmenting Replay in World Models for Continual Reinforcement Learning

Abstract

In continual RL, the environment of a reinforcement learning (RL) agentundergoes change. A successful system should appropriately balance theconflicting requirements of retaining agent performance on already learnedtasks, stability, whilst learning new tasks, plasticity. The first-in-first-outbuffer is commonly used to enhance learning in such settings but requiressignificant memory. We explore the application of an augmentation to thisbuffer which alleviates the memory constraints, and use it with a world modelmodel-based reinforcement learning algorithm, to evaluate its effectiveness infacilitating continual learning. We evaluate the effectiveness of our method inProcgen and Atari RL benchmarks and show that the distribution matchingaugmentation to the replay-buffer used in the context of latent world modelscan successfully prevent catastrophic forgetting with significantly reducedcomputational overhead. Yet, we also find such a solution to not be entirelyinfallible, and other failure modes such as the opposite -- lacking plasticityand being unable to learn a new task -- to be a potential limitation incontinual learning systems.

Quick Read (beta)

loading the full paper ...