EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Abstract

Open-source foundation models have seen rapid adoption and development,enabling powerful general-purpose capabilities across diverse domains. However,fine-tuning large foundation models for domain-specific or personalized tasksremains prohibitively expensive for most users due to the significant memoryoverhead beyond that of inference. We introduce EMLoC, an Emulator-basedMemory-efficient fine-tuning framework with LoRA Correction, which enablesmodel fine-tuning within the same memory budget required for inference. EMLoCconstructs a task-specific light-weight emulator using activation-awaresingular value decomposition (SVD) on a small downstream calibration set.Fine-tuning then is performed on this lightweight emulator via LoRA. To tacklethe misalignment between the original model and the compressed emulator, wepropose a novel compensation algorithm to correct the fine-tuned LoRA module,which thus can be merged into the original model for inference. EMLoC supportsflexible compression ratios and standard training pipelines, making itadaptable to a wide range of applications. Extensive experiments demonstratethat EMLoC outperforms other baselines across multiple datasets and modalities.Moreover, without quantization, EMLoC enables fine-tuning of a 38B model on asingle 24GB consumer GPU-bringing efficient and practical model adaptation toindividual users.

Quick Read (beta)

loading the full paper ...