Differentially Private Language Models Benefit from Public Pre-training

Abstract

Language modeling is a keystone task in natural language processing. Whentraining a language model on sensitive information, differential privacy (DP)allows us to quantify the degree to which our private data is protected.However, training algorithms which enforce differential privacy often lead todegradation in model quality. We study the feasibility of learning a languagemodel which is simultaneously high-quality and privacy preserving by tuning apublic base model on a private corpus. We find that DP fine-tuning boosts theperformance of language models in the private domain, making the training ofsuch models possible.

Quick Read (beta)

loading the full paper ...