FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

Abstract

Low-Rank Adaptation (LoRA) is a widely used parameter-efficient fine-tuningmethod for foundation models, but it suffers from parameter interference,resulting in suboptimal performance. Although Mixture-of-Experts (MoE)-basedLoRA variants show promise in mitigating intra-task correlations in single-taskinstruction tuning, they introduce additional router parameters and remainineffective in multi-task model merging where inter-task interference arises.Inspired by the fly olfactory circuit, we propose FlyLoRA, an implicitMoE-based LoRA variant that introduces: (1) rank-wise expert activation in theup-projection matrix, and (2) an implicit router that unifies expert routingand down-projection, where a frozen sparse random projection matrix replacesthe traditional dense trainable version. This design resolves the trade-offbetween intra-task decorrelation and computational efficiency by eliminatingthe need for an explicit router, while inherently mitigating inter-taskinterference due to the orthogonality property of random matrices. Extensiveexperiments across four domains -- general knowledge understanding, scientificquestion answering, mathematical reasoning, and code generation -- demonstrateconsistent performance improvements over existing methods. Beyond empiricalgains, FlyLoRA highlights how biological structures can inspire innovations inAI technologies. Code is available at https://github.com/gfyddha/FlyLoRA.

Quick Read (beta)

loading the full paper ...