LoRA-PAR: A Flexible Dual-System LoRA Partitioning Approach to Efficient LLM Fine-Tuning

Abstract

Large-scale generative models like DeepSeek-R1 and OpenAI-O1 benefitsubstantially from chain-of-thought (CoT) reasoning, yet pushing theirperformance typically requires vast data, large model sizes, and full-parameterfine-tuning. While parameter-efficient fine-tuning (PEFT) helps reduce cost,most existing approaches primarily address domain adaptation or layer-wiseallocation rather than explicitly tailoring data and parameters to differentresponse demands. Inspired by "Thinking, Fast and Slow," which characterizestwo distinct modes of thought-System 1 (fast, intuitive, often automatic) andSystem 2 (slower, more deliberative and analytic)-we draw an analogy thatdifferent "subregions" of an LLM's parameters might similarly specialize fortasks that demand quick, intuitive responses versus those requiring multi-steplogical reasoning. Therefore, we propose LoRA-PAR, a dual-system LoRA frameworkthat partitions both data and parameters by System 1 or System 2 demands, usingfewer yet more focused parameters for each task. Specifically, we classify taskdata via multi-model role-playing and voting, and partition parameters based onimportance scoring, then adopt a two-stage fine-tuning strategy of trainingSystem 1 tasks with supervised fine-tuning (SFT) to enhance knowledge andintuition and refine System 2 tasks with reinforcement learning (RL) toreinforce deeper logical deliberation next. Extensive experiments show that thetwo-stage fine-tuning strategy, SFT and RL, lowers active parameter usage whilematching or surpassing SOTA PEFT baselines.

Quick Read (beta)

loading the full paper ...