Arcee's MergeKit: A Toolkit for Merging Large Language Models

Abstract

The rapid expansion of the open-source language model landscape presents anopportunity to merge the competencies of these model checkpoints by combiningtheir parameters. Advances in transfer learning, the process of fine-tuningpretrained models for specific tasks, has resulted in the development of vastamounts of task-specific models, typically specialized in individual tasks andunable to utilize each other's strengths. Model merging facilitates thecreation of multitask models without the need for additional training, offeringa promising avenue for enhancing model performance and versatility. Bypreserving the intrinsic capabilities of the original models, model mergingaddresses complex challenges in AI - including the difficulties of catastrophicforgetting and multitask learning. To support this expanding area of research,we introduce MergeKit, a comprehensive, open-source library designed tofacilitate the application of model merging strategies. MergeKit offers anextensible framework to efficiently merge models on any hardware, providingutility to researchers and practitioners. To date, thousands of models havebeen merged by the open-source community, leading to the creation of some ofthe worlds most powerful open-source model checkpoints, as assessed by the OpenLLM Leaderboard. The library is accessible athttps://github.com/arcee-ai/MergeKit.

Quick Read (beta)

loading the full paper ...