Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Abstract

It is a common practice in natural language processing to pre-train a singlemodel on a general domain and then fine-tune it for downstream tasks. However,when it comes to Large Language Models, fine-tuning the entire model can becomputationally expensive, resulting in very intensive energy consumption. As aresult, several Parameter Efficient Fine-Tuning (PEFT) approaches were recentlyproposed. One of the most popular approaches is low-rank adaptation (LoRA),where the key insight is decomposing the update weights of the pre-trainedmodel into two low-rank matrices. However, the proposed approaches either usethe same rank value across all different weight matrices, which has been shownto be a sub-optimal choice, or do not use any quantization technique, one ofthe most important factors when it comes to a model's energy consumption. Inthis work, we propose Bayesian-LoRA which approaches low-rank adaptation andquantization from a Bayesian perspective by employing a prior distribution onboth quantization levels and rank values. As a result, B-LoRA is able tofine-tune a pre-trained model on a specific downstream task, finding theoptimal rank values and quantization levels for every low-rank matrix. Wevalidate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUEbenchmark. Moreover, we compare it to relevant baselines and present bothqualitative and quantitative results, showing how the proposed approach is ableto learn optimal-rank quantized matrices. B-LoRA performs on par with or betterthan the baselines while reducing the total number of bit operations by roughly70% compared to the baseline methods.

Quick Read (beta)

loading the full paper ...