Abstract
Large Language Models (LLMs) have shown strong abilities in general languagetasks, yet adapting them to specific domains remains a challenge. Currentmethod like Domain Adaptive Pretraining (DAPT) requires costly full-parametertraining and suffers from catastrophic forgetting. Meanwhile,Retrieval-Augmented Generation (RAG) introduces substantial inference latencydue to expensive nearest-neighbor searches and longer context. This paperintroduces Memory Decoder, a plug-and-play pretrained memory that enablesefficient domain adaptation without changing the original model's parameters.Memory Decoder employs a small transformer decoder that learns to imitate thebehavior of an external non-parametric retriever. Once trained, Memory Decodercan be seamlessly integrated with any pretrained language model that shares thesame tokenizer, requiring no model-specific modifications. Experimental resultsdemonstrate that Memory Decoder enables effective adaptation of various Qwenand Llama models to three distinct specialized domains: biomedicine, finance,and law, reducing perplexity by an average of 6.17 points. Overall, MemoryDecoder introduces a novel paradigm centered on a specially pretrained memorycomponent designed for domain-specific adaptation. This memory architecture canbe integrated in a plug-and-play manner, consistently enhancing performanceacross multiple models within the target domain.