Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

Abstract

Continual learning remains a challenge across various natural languageprocessing (NLP) tasks, as models updated with new training data often riskcatastrophic forgetting of previously acquired knowledge. We introduce adiscrete key-value bottleneck (DKVB) for encoder-only language models, enablingefficient continual learning through localized updates. Inspired by a discretekey-value bottleneck in vision, we consider new and NLP-specific challenges. Wecompare different bottleneck architectures for NLP and introduce a new,task-independent initialization technique for the discrete keys. We evaluateour DKVB for NLP in four continual learning scenarios and show that italleviates catastrophic forgetting. Our experiments demonstrate that theproposed approach achieves competitive performance compared to popularcontinual learning methods while incurring lower computational costs.Furthermore, we show that DKVB remains effective even in challengingsingle-head continual learning scenarios where no task ID is provided.

Quick Read (beta)

loading the full paper ...