Abstract
Mixture-of-Experts (MoE) has been gaining popularity due to its successfuladaptation to large language models (LLMs). In this work, we introducePrivacy-preserving Collaborative Mixture-of-Experts (PC-MoE), which leveragesthe sparsity of the MoE architecture for memory-efficient decentralizedcollaborative LLM training, enabling multiple parties with limited GPU-memoryand data resources to collectively train more capable LLMs than they couldachieve individually. At the same time, this approach protects training dataprivacy of each participant by keeping training data, as well as parts of theforward pass signal and gradients locally within each party. By design, PC-MoEsynergistically combines the strengths of distributed computation with strongconfidentiality assurances. Unlike most privacy-preserving schemes, which payfor confidentiality with lower task accuracy, our framework breaks thattrade-off: across seven popular LLM benchmarks, it almost matches (andsometimes exceeds) the performance and convergence rate of a fully centralizedmodel, enjoys near 70% peak GPU RAM reduction, while being fully robust againstreconstruction attacks.