Abstract
The widespread adoption of cloud-based proprietary large language models(LLMs) has introduced significant challenges, including operationaldependencies, privacy concerns, and the necessity of continuous internetconnectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", forthe seamless migration of knowledge and abilities from service-oriented LLMs tosmaller, locally manageable models. This pipeline is crucial for ensuringservice continuity in the presence of operational failures, strict privacypolicies, or offline requirements. Our LlamaDuo involves fine-tuning a smalllanguage model against the service LLM using a synthetic dataset generated bythe latter. If the performance of the fine-tuned model falls short ofexpectations, it is automatically improved through additional fine-tuning usingextra similar data generated by the service LLM. This multi-turn processguarantees that the smaller model can eventually match or even surpass theservice LLM's capabilities in specific downstream tasks, offering a practicaland scalable solution for managing AI deployments in constrained environments.Extensive experiments with leading-edge LLMs are conducted to demonstrate theeffectiveness, adaptability, and affordability of LlamaDuo across variousdownstream tasks. Our pipeline implementation is available athttps://github.com/deep-diver/llamaduo.