Single-Task Continual Offline Reinforcement Learning

Abstract

In this paper, we study the continual learning problem of single-task offlinereinforcement learning. In the past, continual reinforcement learning usuallyonly dealt with multitasking, that is, learning multiple related or unrelatedtasks in a row, but once each learned task was learned, it was not relearned,but only used in subsequent processes. However, offline reinforcement learningtasks require the continuously learning of multiple different datasets for thesame task. Existing algorithms will try their best to achieve the best resultsin each offline dataset they have learned and the skills of the network willoverwrite the high-quality datasets that have been learned after learning thesubsequent poor datasets. On the other hand, if too much emphasis is placed onstability, the network will learn the subsequent better dataset after learningthe poor offline dataset, and the problem of insufficient plasticity andnon-learning will occur. How to design a strategy that can always preserve thebest performance for each state in the data that has been learned is a newchallenge and the focus of this study. Therefore, this study proposes a newalgorithm, called Ensemble Offline Reinforcement Learning Based on ExperienceReplay, which introduces multiple value networks to learn the same dataset andjudge whether the strategy has been learned by the discrete degree of the valuenetwork, to improve the performance of the network in single-task offlinereinforcement learning.

Quick Read (beta)

loading the full paper ...