GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Abstract

The continual learning capability of large language models (LLMs) is crucialfor advancing artificial general intelligence. However, continual fine-tuningLLMs across various domains often suffers from catastrophic forgetting,characterized by: 1) significant forgetting of their general capabilities, and2) sharp performance declines in previously learned tasks. To simultaneouslyaddress both issues in a simple yet stable manner, we propose General SampleReplay (GeRe), a framework that use usual pretraining texts for efficientanti-forgetting. Beyond revisiting the most prevalent replay-based practicesunder GeRe, we further leverage neural states to introduce a enhancedactivation states constrained optimization method using threshold-based margin(TM) loss, which maintains activation state consistency during replay learning.We are the first to validate that a small, fixed set of pre-collected generalreplay samples is sufficient to resolve both concerns--retaining generalcapabilities while promoting overall performance across sequential tasks.Indeed, the former can inherently facilitate the latter. Through controlledexperiments, we systematically compare TM with different replay strategiesunder the GeRe framework, including vanilla label fitting, logit imitation viaKL divergence and feature imitation via L1/L2 losses. Results demonstrate thatTM consistently improves performance and exhibits better robustness. Our workpaves the way for efficient replay of LLMs for the future. Our code and dataare available at https://github.com/Qznan/GeRe.

Quick Read (beta)

loading the full paper ...