Abstract
In this work, we investigate the means of using curiosity on replay buffersto improve offline multi-task continual reinforcement learning when tasks,which are defined by the non-stationarity in the environment, are non labeledand not evenly exposed to the learner in time. In particular, we investigatethe use of curiosity both as a tool for task boundary detection and as apriority metric when it comes to retaining old transition tuples, which werespectively use to propose two different buffers. Firstly, we propose a HybridReservoir Buffer with Task Separation (HRBTS), where curiosity is used todetect task boundaries that are not known due to the task agnostic nature ofthe problem. Secondly, by using curiosity as a priority metric when it comes toretaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. Weultimately show that these buffers, in conjunction with regular reinforcementlearning algorithms, can be used to alleviate the catastrophic forgetting issuesuffered by the state of the art on replay buffers when the agent's exposure totasks is not equal along time. We evaluate catastrophic forgetting and theefficiency of our proposed buffers against the latest works such as the HybridReservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in threedifferent continual reinforcement learning settings. Experiments were done onclassical control tasks and Metaworld environment. Experiments show that ourproposed replay buffers display better immunity to catastrophic forgettingcompared to existing works in most of the settings.