OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

  • 2024-05-28 18:22:22
  • Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
  • 0

Abstract

The rapid advancements in Large Language Models (LLMs) have revolutionizedvarious natural language processing tasks. However, the substantial size ofLLMs presents significant challenges in training or fine-tuning. Whileparameter-efficient approaches such as low-rank adaptation (LoRA) have gainedpopularity, they often compromise performance compared to full-rankfine-tuning. In this paper, we propose Outlier-weighed Layerwise SampledLow-Rank Projection (OwLore), a new memory-efficient fine-tuning approach,inspired by the layerwise outlier distribution of LLMs, which dynamicallysamples pre-trained layers to fine-tune instead of adding additional adaptors.We first interpret the outlier phenomenon through the lens of Heavy-TailedSelf-Regularization theory (HT-SR), discovering that layers with more outlierstend to be more heavy-tailed and consequently better trained. Inspired by thisfinding, OwLore strategically assigns higher sampling probabilities to layerswith more outliers to better leverage the knowledge stored in pre-trained LLMs.To further mitigate the memory demands of fine-tuning, we integrate gradientlow-rank projection into our approach, which facilitates each layer to beefficiently trained in a low-rank manner. By incorporating the efficientcharacteristics of low-rank and optimal layerwise sampling, OwLoresignificantly improves the memory-performance trade-off in LLM pruning. Ourextensive experiments across various architectures, including LLaMa2, LLaMa3,and Mistral, demonstrate that OwLore consistently outperforms baselineapproaches, including full fine-tuning. Specifically, it achieves up to a 1.1%average accuracy gain on the Commonsense Reasoning benchmark, a 3.0%improvement on MMLU, and a notable 10% boost on MT-Bench, while being morememory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB ofmemory.

 

Quick Read (beta)

loading the full paper ...