When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs

  • 2026-02-01 02:33:11
  • Junyi Zou
  • 0

Abstract

Large language models (LLMs) can exhibit surprising \emph{adapter interference} when combining domain adaptation and instruction alignment in safety-critical settings. We study a 14B base model trained with a two-stage LoRA pipeline: (i) domain-oriented pre-training (PT/DOPT) for medical knowledge injection and (ii) supervised fine-tuning (SFT) for instruction following on medical QA. We then form a \emph{weighted adapter merge} by linearly combining PT and SFT LoRA deltas before exporting a single merged checkpoint for inference. We find that adding PT signal can reactivate latent ``thinking'' behavior and systematically shift the output distribution even when training/evaluation templates attempt to disable chain-of-thought. Under a fixed generation evaluation (template \texttt{qwen3\_nothink}, Temp=0.6, Top-$p$=0.8), pure SFT achieves BLEU-4=17.84 on our validation set, while the merged model (PT=0.3, SFT=0.7) drops to BLEU-4=6.50. Meanwhile multiple-choice accuracy remains comparable (avg 0.777 vs 0.778) and MedQA improves from 0.664 to 0.681. We further show that small pipeline mistakes (e.g., loading the wrong adapter, export-directory overwrite, or template mismatch) can spuriously attribute SFT-only behavior to merged models. We provide a lightweight merge-verification routine that numerically checks merged weights against the intended linear combination, along with full logs for reproducibility.

 

Quick Read (beta)

loading the full paper ...