Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples

Abstract

Recently, Sharma et al. suggested a method called Layer-SElective-Rankreduction (LASER) which demonstrated that pruning high-order components ofcarefully chosen LLM's weight matrices can boost downstream accuracy -- withoutany gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search (eachrequiring full-dataset forward passes) makes it impractical for rapiddeployment. We demonstrate that this overhead can be removed and find that: (i)Only a small, carefully chosen subset of matrices needs to be inspected --eliminating the layer-by-layer sweep, (ii) The gradient of each matrix'ssingular values pinpoints which matrices merit reduction, (iii) Increasing thefactorization search space by allowing matrices rows to cluster around multiplesubspaces and then decomposing each cluster separately further reducesoverfitting on the original training data and further lifts accuracy by up to24.6 percentage points, and finally, (iv) we discover that evaluating on just100 samples rather than the full training data -- both for computing theindicative gradients and for measuring the final accuracy -- suffices tofurther reduce the search time; we explain that as adaptation to downstreamtasks is dominated by prompting style, not dataset size. As a result, we showthat combining these findings yields a fast and robust adaptation algorithm fordownstream tasks. Overall, with a single gradient step on 100 examples and aquick scan of the top candidate layers and factorization techniques, we canadapt LLMs to new datasets -- entirely without fine-tuning.

Quick Read (beta)

loading the full paper ...