Abstract
Machine unlearning has been used to remove unwanted knowledge acquired bylarge language models (LLMs). In this paper, we examine machine unlearning froman optimization perspective, framing it as a regularized multi-taskoptimization problem, where one task optimizes a forgetting objective andanother optimizes the model performance. In particular, we introduce anormalized gradient difference (NGDiff) algorithm, enabling us to have bettercontrol over the trade-off between the objectives, while integrating a new,automatic learning rate scheduler. We provide a theoretical analysis andempirically demonstrate the superior performance of NGDiff amongstate-of-the-art unlearning methods on the TOFU and MUSE datasets whileexhibiting stable training.
Quick Read (beta)
loading the full paper ...