A Survey on Unlearnable Data

Abstract

Unlearnable data (ULD) has emerged as an innovative defense technique toprevent machine learning models from learning meaningful patterns from specificdata, thus protecting data privacy and security. By introducing perturbationsto the training data, ULD degrades model performance, making it difficult forunauthorized models to extract useful representations. Despite the growingsignificance of ULD, existing surveys predominantly focus on related fields,such as adversarial attacks and machine unlearning, with little attention givento ULD as an independent area of study. This survey fills that gap by offeringa comprehensive review of ULD, examining unlearnable data generation methods,public benchmarks, evaluation metrics, theoretical foundations and practicalapplications. We compare and contrast different ULD approaches, analyzing theirstrengths, limitations, and trade-offs related to unlearnability,imperceptibility, efficiency and robustness. Moreover, we discuss keychallenges, such as balancing perturbation imperceptibility with modeldegradation and the computational complexity of ULD generation. Finally, wehighlight promising future research directions to advance the effectiveness andapplicability of ULD, underscoring its potential to become a crucial tool inthe evolving landscape of data protection in machine learning.

Quick Read (beta)

loading the full paper ...