Lifelong Reinforcement Learning with Similarity-Driven Weighting by Large Models

Abstract

Lifelong Reinforcement Learning (LRL) holds significant potential foraddressing sequential tasks, but it still faces considerable challenges. A keydifficulty lies in effectively preventing catastrophic forgetting andfacilitating knowledge transfer while maintaining reliable decision-makingperformance across subsequent tasks in dynamic environments. To tackle this, wepropose a novel framework, SDW (Similarity-Driven Weighting Framework), whichleverages large-language-model-generated dynamic functions to precisely controlthe training process. The core of SDW lies in two functions pre-generated bylarge models: the task similarity function and the weight computation function.The task similarity function extracts multidimensional features from taskdescriptions to quantify the similarities and differences between tasks interms of states, actions, and rewards. The weight computation functiondynamically generates critical training parameters based on the similarityinformation, including the proportion of old task data stored in the ReplayBuffer and the strategy consistency weight in the loss function, enabling anadaptive balance between learning new tasks and transferring knowledge fromprevious tasks. By generating function code offline prior to training, ratherthan relying on large-model inference during the training process, the SDWframework reduces computational overhead while maintaining efficiency insequential task scenarios. Experimental results on Atari and MiniHacksequential tasks demonstrate that SDW significantly outperforms existinglifelong reinforcement learning methods.

Quick Read (beta)

loading the full paper ...