Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate

Abstract

Machine unlearning has been used to remove unwanted knowledge acquired bylarge language models (LLMs). In this paper, we examine machine unlearning froman optimization perspective, framing it as a regularized multi-taskoptimization problem, where one task optimizes a forgetting objective andanother optimizes the model performance. In particular, we introduce anormalized gradient difference (NGDiff) algorithm, enabling us to have bettercontrol over the trade-off between the objectives, while integrating a new,automatic learning rate scheduler. We provide a theoretical analysis andempirically demonstrate the superior performance of NGDiff amongstate-of-the-art unlearning methods on the TOFU and MUSE datasets whileexhibiting stable training.

Quick Read (beta)

loading the full paper ...