Synergistic Weak-Strong Collaboration by Aligning Preferences

Abstract

Current Large Language Models (LLMs) excel in general reasoning yet strugglewith specialized tasks requiring proprietary or domain-specific knowledge.Fine-tuning large models for every niche application is often infeasible due toblack-box constraints and high computational overhead. To address this, wepropose a collaborative framework that pairs a specialized weak model with ageneral strong model. The weak model, tailored to specific domains, producesinitial drafts and background information, while the strong model leverages itsadvanced reasoning to refine these drafts, extending LLMs' capabilities tocritical yet specialized tasks. To optimize this collaboration, we introduce acollaborative feedback to fine-tunes the weak model, which quantifies theinfluence of the weak model's contributions in the collaboration procedure andestablishes preference pairs to guide preference tuning of the weak model. Wevalidate our framework through experiments on three domains. We find that thecollaboration significantly outperforms each model alone by leveragingcomplementary strengths. Moreover, aligning the weak model with thecollaborative preference further enhances overall performance.

Quick Read (beta)

loading the full paper ...