Unlocking the Potential of Model Merging for Low-Resource Languages

Abstract

Adapting large language models (LLMs) to new languages typically involvescontinual pre-training (CT) followed by supervised fine-tuning (SFT). However,this CT-then-SFT approach struggles with limited data in the context oflow-resource languages, failing to balance language modeling and task-solvingcapabilities. We thus propose model merging as an alternative for low-resourcelanguages, combining models with distinct capabilities into a single modelwithout additional training. We use model merging to develop task-solving LLMsfor low-resource languages without SFT data in the target languages. Ourexperiments based on Llama-2-7B demonstrate that model merging effectivelyendows LLMs for low-resource languages with task-solving abilities,outperforming CT-then-SFT in scenarios with extremely scarce data. Observingperformance saturation in model merging with more training tokens, we furtheranalyze the merging process and introduce a slack variable to the model mergingalgorithm to mitigate the loss of important parameters, thereby enhancingperformance. We hope that model merging can benefit more human languagessuffering from data scarcity with its higher data efficiency.

Quick Read (beta)

loading the full paper ...