GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Abstract

Unlearning in large language models (LLMs) is becoming increasingly importantdue to regulatory compliance, copyright protection, and privacy concerns.However, a key challenge in LLM unlearning is unintended forgetting, where theremoval of specific data inadvertently impairs the utility of the model and itsretention of valuable, desired information. While prior work has primarilyfocused on architectural innovations, the influence of data-level factors onunlearning performance remains underexplored. As a result, existing methodsoften suffer from degraded retention when forgetting high-impact data. Toaddress this, we propose GUARD-a novel framework for Guided Unlearning AndRetention via Data attribution. At its core, GUARD introduces a lightweightproxy data attribution metric tailored for LLM unlearning, which quantifies the"alignment" between the forget and retain sets while remaining computationallyefficient. Building on this, we design a novel unlearning objective thatassigns adaptive, nonuniform unlearning weights to samples, inverselyproportional to their proxy attribution scores. Through such a reallocation ofunlearning power, GUARD mitigates unintended losses in retention. We providerigorous theoretical guarantees that GUARD significantly enhances retentionwhile maintaining forgetting metrics comparable to prior methods. Extensiveexperiments on the TOFU benchmark across multiple LLM architectures demonstratethat GUARD substantially improves utility preservation while ensuring effectiveunlearning. Notably, GUARD reduces utility sacrifice on the Retain Set by up to194.92% in terms of Truth Ratio when forgetting 10% of the training data.

Quick Read (beta)

loading the full paper ...