Abstract
Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial forrapidly adapting large language models (LLMs) to downstream tasks.Prefix-Tuning, an early and effective PEFT technique, demonstrated the abilityto achieve performance comparable to full fine-tuning with significantlyreduced computational and memory overhead. However, despite its earliersuccess, its effectiveness in training modern state-of-the-art LLMs has beenvery limited. In this work, we demonstrate empirically that Prefix-Tuningunderperforms on LLMs because of an inherent tradeoff between input and prefixsignificance within the attention head. This motivates us to introducePrefix-Tuning+, a novel architecture that generalizes the principles ofPrefix-Tuning while addressing its shortcomings by shifting the prefix moduleout of the attention head itself. We further provide an overview of ourconstruction process to guide future users when constructing their owncontext-based methods. Our experiments show that, across a diverse set ofbenchmarks, Prefix-Tuning+ consistently outperforms existing Prefix-Tuningmethods. Notably, it achieves performance on par with the widely adopted LoRAmethod on several general benchmarks, highlighting the potential modernextension of Prefix-Tuning approaches. Our findings suggest that by overcomingits inherent limitations, Prefix-Tuning can remain a competitive and relevantresearch direction in the landscape of parameter-efficient LLM adaptation.