Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning

Abstract

We consider a Continual Reinforcement Learning setup, where a learning agentmust continuously adapt to new tasks while retaining previously acquired skillsets, with a focus on the challenge of avoiding forgetting past gatheredknowledge and ensuring scalability with the growing number of tasks. Suchissues prevail in autonomous robotics and video game simulations, notably fornavigation tasks prone to topological or kinematic changes. To address theseissues, we introduce HiSPO, a novel hierarchical framework designedspecifically for continual learning in navigation settings from offline data.Our method leverages distinct policy subspaces of neural networks to enableflexible and efficient adaptation to new tasks while preserving existingknowledge. We demonstrate, through a careful experimental study, theeffectiveness of our method in both classical MuJoCo maze environments andcomplex video game-like navigation simulations, showcasing competitiveperformances and satisfying adaptability with respect to classical continuallearning metrics, in particular regarding the memory usage and efficiency.

Quick Read (beta)

loading the full paper ...