Abstract
Large Language Models (LLMs) deployed in real-world settings increasinglyface the need to unlearn sensitive, outdated, or proprietary information.Existing unlearning methods typically formulate forgetting and retention as aregularized trade-off, combining both objectives into a single scalarized loss.This often leads to unstable optimization and degraded performance on retaineddata, especially under aggressive forgetting. We propose a new formulation ofLLM unlearning as a constrained optimization problem: forgetting is enforcedvia a novel logit-margin flattening loss that explicitly drives the outputdistribution toward uniformity on a designated forget set, while retention ispreserved through a hard constraint on a separate retain set. Compared toentropy-based objectives, our loss is softmax-free, numerically stable, andmaintains non-vanishing gradients, enabling more efficient and robustoptimization. We solve the constrained problem using a scalable primal-dualalgorithm that exposes the trade-off between forgetting and retention throughthe dynamics of the dual variable. Evaluations on the TOFU and MUSE benchmarksacross diverse LLM architectures demonstrate that our approach consistentlymatches or exceeds state-of-the-art baselines, effectively removing targetedinformation while preserving downstream utility.