Provably Confidential Language Modelling

Abstract

Large language models are shown to memorize privacy information such associal security numbers in training data. Given the sheer scale of the trainingcorpus, it is challenging to screen and filter these privacy data, eithermanually or automatically. In this paper, we propose Confidentially RedactedTraining (CRT), a method to train language generation models while protectingthe confidential segments. We borrow ideas from differential privacy (whichsolves a related but distinct problem) and show that our method is able toprovably prevent unintended memorization by randomizing parts of the trainingprocess. Moreover, we show that redaction with an approximately correctscreening policy amplifies the confidentiality guarantee. We implement themethod for both LSTM and GPT language models. Our experimental results showthat the models trained by CRT obtain almost the same perplexity whilepreserving strong confidentiality.

Quick Read (beta)

loading the full paper ...