Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning

Abstract

In Reinforcement Learning (RL), training a policy from scratch with onlineexperiences can be inefficient because of the difficulties in exploration.Recently, offline RL provides a promising solution by giving an initializedoffline policy, which can be refined through online interactions. However,existing approaches primarily perform offline and online learning in the sametask, without considering the task generalization problem in offline-to-onlineadaptation. In real-world applications, it is common that we only have anoffline dataset from a specific task while aiming for fast online-adaptationfor several tasks. To address this problem, our work builds upon theinvestigation of successor representations for task generalization in online RLand extends the framework to incorporate offline-to-online learning. Wedemonstrate that the conventional paradigm using successor features cannoteffectively utilize offline data and improve the performance for the new taskby online fine-tuning. To mitigate this, we introduce a novel methodology thatleverages offline data to acquire an ensemble of successor representations andsubsequently constructs ensemble Q functions. This approach enables robustrepresentation learning from datasets with different coverage and facilitatesfast adaption of Q functions towards new tasks during the online fine-tuningphase. Extensive empirical evaluations provide compelling evidence showcasingthe superior performance of our method in generalizing to diverse or evenunseen tasks.

Quick Read (beta)

loading the full paper ...