LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Abstract

Low-Rank Adaptation, also known as LoRA, has emerged as a prominent methodfor parameter-efficient fine-tuning foundation models by re-parameterizing theoriginal matrix into the product of two low-rank matrices. Despite itsefficiency, LoRA often yields inferior performance compared to fullfine-tuning. In this paper, we propose LoRA-Pro to bridge this performance gap.Firstly, we delve into the optimization processes in LoRA and full fine-tuning.We reveal that while LoRA employs low-rank approximation, it neglects toapproximate the optimization process of full fine-tuning. To address this, weintroduce a novel concept called the "equivalent gradient." This virtualgradient makes the optimization process on the re-parameterized matrixequivalent to LoRA, which can be used to quantify the differences between LoRAand full fine-tuning. The equivalent gradient is derived from the gradients ofmatrices $A$ and $B$. To narrow the performance gap, our approach minimizes thedifferences between the equivalent gradient and the gradient obtained from fullfine-tuning during the optimization process. By solving this objective, wederive optimal closed-form solutions for updating matrices $A$ and $B$. Ourmethod constrains the optimization process, shrinking the performance gapbetween LoRA and full fine-tuning. Extensive experiments on natural languageprocessing tasks validate the effectiveness of our method.

Quick Read (beta)

loading the full paper ...