LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models

Abstract

Language Models (LMs) typically adhere to a "pre-training and fine-tuning"paradigm, where a universal pre-trained model can be fine-tuned to cater tovarious specialized domains. Low-Rank Adaptation (LoRA) has gained the mostwidespread use in LM fine-tuning due to its lightweight computational cost andremarkable performance. Because the proportion of parameters tuned by LoRA isrelatively small, there might be a misleading impression that the LoRAfine-tuning data is invulnerable to Membership Inference Attacks (MIAs).However, we identify that utilizing the pre-trained model can induce moreinformation leakage, which is neglected by existing MIAs. Therefore, weintroduce LoRA-Leak, a holistic evaluation framework for MIAs against thefine-tuning datasets of LMs. LoRA-Leak incorporates fifteen membershipinference attacks, including ten existing MIAs, and five improved MIAs thatleverage the pre-trained model as a reference. In experiments, we applyLoRA-Leak to three advanced LMs across three popular natural languageprocessing tasks, demonstrating that LoRA-based fine-tuned LMs are stillvulnerable to MIAs (e.g., 0.775 AUC under conservative fine-tuning settings).We also applied LoRA-Leak to different fine-tuning settings to understand theresulting privacy risks. We further explore four defenses and find that onlydropout and excluding specific LM layers during fine-tuning effectivelymitigate MIA risks while maintaining utility. We highlight that under the"pre-training and fine-tuning" paradigm, the existence of the pre-trained modelmakes MIA a more severe risk for LoRA-based LMs. We hope that our findings canprovide guidance on data privacy protection for specialized LM providers.

Quick Read (beta)

loading the full paper ...