Abstract
Fine-tuning Large Language Models (LLMs) has proven effective for a varietyof downstream tasks. However, as LLMs grow in size, the memory demands forbackpropagation become increasingly prohibitive. Zeroth-order (ZO) optimizationmethods offer a memory-efficient alternative by using forward passes toestimate gradients, but the variance of gradient estimates typically scaleslinearly with the model's parameter dimension$\unicode{x2013}$a significantissue for LLMs. In this paper, we propose the random Subspace Zeroth-order(SubZero) optimization to address the challenges posed by LLMs' highdimensionality. We introduce a low-rank perturbation tailored for LLMs thatsignificantly reduces memory consumption while improving training performance.Additionally, we prove that our gradient estimation closely approximates thebackpropagation gradient, exhibits lower variance than traditional ZO methods,and ensures convergence when combined with SGD. Experimental results show thatSubZero enhances fine-tuning performance and achieves faster convergencecompared to standard ZO approaches like MeZO across various language modelingtasks. Code is available at https://github.com/zimingyy/SubZero.