Abstract
In this work, we propose a new setting of continual learning:data-incremental continual offline reinforcement learning (DICORL), in which anagent is asked to learn a sequence of datasets of a single offlinereinforcement learning (RL) task continually, instead of learning a sequence ofoffline RL tasks with respective datasets. Then, we propose that this newsetting will introduce a unique challenge to continual learning: activeforgetting, which means that the agent will forget the learnt skill actively.The main reason for active forgetting is conservative learning used by offlineRL, which is used to solve the overestimation problem. With conservativelearning, the offline RL method will suppress the value of all actions, learntor not, without selection, unless it is in the just learning dataset.Therefore, inferior data may overlay premium data because of the learningsequence. To solve this problem, we propose a new algorithm, calledexperience-replay-based ensemble implicit Q-learning (EREIQL), which introducesmultiple value networks to reduce the initial value and avoid usingconservative learning, and the experience replay to relieve catastrophicforgetting. Our experiments show that EREIQL relieves active forgetting inDICORL and performs well.