Abstract
Machine unlearning algorithms are increasingly important as legal concernsarise around the provenance of training data, but verifying the success ofunlearning is often difficult. Provable guarantees for unlearning are oftenlimited to supervised learning settings. In this paper, we provide the firsttheoretical guarantees for unlearning in the pre-training and fine-tuningparadigm by studying topic models, simple bag-of-words language models that canbe adapted to solve downstream tasks like retrieval and classification. First,we design a provably effective unlearning algorithm for topic models thatincurs a computational overhead independent of the size of the originaldataset. Our analysis additionally quantifies the deletion capacity of themodel -- i.e., the number of examples that can be unlearned without incurring asignificant cost in model performance. Finally, we formally extend our analysesto account for adaptation to a given downstream task. In particular, we designan efficient algorithm to perform unlearning after fine-tuning the topic modelvia a linear head. Notably, we show that it is easier to unlearn pre-trainingdata from models that have been fine-tuned to a particular task, and one canunlearn this data without modifying the base model.