Asymmetry in Low-Rank Adapters of Foundation Models

Abstract

Parameter-efficient fine-tuning optimizes large, pre-trained foundationmodels by updating a subset of parameters; in this class, Low-Rank Adaptation(LoRA) is particularly effective. Inspired by an effort to investigate thedifferent roles of LoRA matrices during fine-tuning, this paper characterizesand leverages unexpected asymmetry in the importance of low-rank adaptermatrices. Specifically, when updating the parameter matrices of a neuralnetwork by adding a product $BA$, we observe that the $B$ and $A$ matrices havedistinct functions: $A$ extracts features from the input, while $B$ uses thesefeatures to create the desired output. Based on this observation, wedemonstrate that fine-tuning $B$ is inherently more effective than fine-tuning$A$, and that a random untrained $A$ should perform nearly as well as afine-tuned one. Using an information-theoretic lens, we also bound thegeneralization of low-rank adapters, showing that the parameter savings ofexclusively training $B$ improves the bound. We support our conclusions withexperiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.

Quick Read (beta)

loading the full paper ...