Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models

  • 2025-08-21 09:56:28
  • Zezhou Wang, Yaxin Du, Xingjun Ma, Yugang Jiang, Zhuzhong Qian, Siheng Chen
  • 0

Abstract

Federated domain-specific instruction tuning (FedDIT) for large languagemodels (LLMs) aims to enhance performance in specialized domains usingdistributed private and limited data, yet identifying key performance driversand optimal augmentation strategies remains challenging. We empiricallyestablish that cross-client domain coverage, rather than data heterogeneity, isthe pivotal factor. We then introduce FedDCA, an algorithm that explicitlymaximizes this coverage through diversity-oriented client center selection andretrieval-based augmentation, constructing diverse, non-redundant cross-clientinstruction sets. Extensive experiments across multiple domains demonstrateFedDCA's superiority over eleven baselines, achieving performance gains of upto 29.19\% and domain coverage improvements of 4.82\%-21.36\%. FedDCA maintainsits effectiveness in diverse and challenging scenarios, including dataselection, held-out settings where task-specific public data is scarce andvarious data heterogeneity, with manageable privacy risks. This work clarifiescritical FedDIT dynamics and presents FedDCA as an effective,privacy-preserving, and scalable solution for advancing domain-specific LLMtuning.

 

Quick Read (beta)

loading the full paper ...