Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Abstract

Merging Large Language Models (LLMs) aims to amalgamate multiple homologousLLMs into one with all the capabilities. Ideally, any LLMs sharing the samebackbone should be mergeable, irrespective of whether they are Fine-Tuned (FT)with minor parameter changes or Pre-Trained (PT) with substantial parametershifts. However, existing methods often manually assign the model importance,rendering them feasible only for LLMs with similar parameter alterations, suchas multiple FT LLMs. The diverse parameter changed ranges between FT and PTLLMs pose challenges for current solutions in empirically determining theoptimal combination. In this paper, we make a pioneering effort to broaden theapplicability of merging techniques from FT to PT LLMs. We initially examinethe efficacy of current methods in merging FT and PT LLMs, discovering thatthey struggle to deal with PT LLMs. Subsequently, we introduce an approachbased on WeIght DisENtanglement (WIDEN) to effectively extend the mergingscope, which first disentangles model weights into magnitude and directioncomponents, and then performs adaptive fusion by considering their respectivecontributions. In the experiments, we merge Qwen1.5-Chat (an FT LLM withinstruction-following skills) with Sailor (a PT LLM with multilingualabilities) across 7B and 14B model scales. Results reveal that: (1) existingsolutions usually fail when merging Sailor, either losing both abilities oronly retaining instruction-following skills; (2) WIDEN successfully injects themultilingual abilities of Sailor into Qwen1.5-Chat and make it proficient inSoutheast Asian languages, achieving enhancements in the fundamentalcapabilities. In light of previous research, we also merge multiple 13B FT LLMsand observe that WIDEN achieves a balanced amalgamation of instructionfollowing, mathematical reasoning, and code generation skills.

Quick Read (beta)

loading the full paper ...