Deep Contrastive Unlearning for Language Models

Abstract

The past a few years have witnessed the great success of large languagemodels, demonstrating powerful capabilities in comprehending textual data andgenerating human-like languages. Large language models achieve success by beingtrained on vast amounts of textual data, including online sources withcopyrighted content and user-generated knowledge. However, this comes at acost: the potential risk of exposing users' privacy and violating copyrightprotections. Thus, to safeguard individuals' "right to be forgotten", there hasbeen increasing interests in machine unlearning -- the process of removinginformation carried by particular training samples from a model while notdeteriorating its predictive quality. This is a challenging task due to theblack-box nature of language models. Most existing studies focus on mitigatingthe impact of those forgot samples upon a model's outputs, and do notexplicitly consider the geometric distributions of samples in the latent spaceof a model. To address this issue, we propose a machine unlearning framework,named Deep Contrastive Unlearning for fine-Tuning (DeepCUT) language models.Our proposed model achieves machine unlearning by directly optimizing thelatent space of a model. Comprehensive experiments on real-world datasetsdemonstrate the effectiveness and efficiency of DeepCUT with consistent andsignificant improvement over baseline methods.

Quick Read (beta)

loading the full paper ...