Differentially Private Distributed Learning for Language Modeling Tasks

Abstract

One of the big challenges in machine learning applications is that trainingdata can be different from the real-world data faced by the algorithm. Inlanguage modeling, users' language (e.g. in private messaging) could change ina year and be completely different from what we observe in publicly availabledata. At the same time, public data can be used for obtaining general knowledge(i.e. general model of English). We study approaches to distributed fine-tuningof a general model on user private data with the additional requirements ofmaintaining the quality on the general data and minimization of communicationcosts. We propose a novel technique that significantly improves predictionquality on users' language compared to a general model and outperforms gradientcompression methods in terms of communication efficiency. The proposedprocedure is fast and leads to an almost 70% perplexity reduction and 8.7percentage point improvement in keystroke saving rate on informal Englishtexts. We also show that the range of tasks our approach is applicable to isnot limited by language modeling only. Finally, we propose an experimentalframework for evaluating differential privacy of distributed training oflanguage models and show that our approach has good privacy guarantees.

Quick Read (beta)

loading the full paper ...