Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules

  • 2025-08-04 16:43:09
  • Yilun Liu, Yunpu Ma, Yuetian Lu, Shuo Chen, Zifeng Ding, Volker Tresp
  • 0

Abstract

Mixture-of-Experts (MoE) benefits from a dynamic routing mechanism amongtheir specialized experts, which existing Parameter- Efficient Fine-Tuning(PEFT) strategies fail to leverage. This motivates us to investigate whetheradaptation modules themselves should incorporate routing mechanisms to alignwith MoE's multi-expert architecture. We analyze dynamics of core componentswhen applying PEFT to MoE language models and examine how different routingstrategies affect adaptation effectiveness. Extensive experiments adaptingOLMoE-1B-7B and Mixtral-8x7B on various commonsense and math reasoning tasksvalidate the performance and efficiency of our routed approach. We identify theoptimal configurations for different scenarios and provide empirical analyseswith practical insights to facilitate better PEFT and MoE applications.

 

Quick Read (beta)

loading the full paper ...