Federated Learning of N-gram Language Models

Abstract

We propose algorithms to train production-quality n-gram language modelsusing federated learning. Federated learning is a distributed computationplatform that can be used to train global models for portable devices such assmart phones. Federated learning is especially relevant for applicationshandling privacy-sensitive data, such as virtual keyboards, because training isperformed without the users' data ever leaving their devices. While theprinciples of federated learning are fairly generic, its methodology assumesthat the underlying models are neural networks. However, virtual keyboards aretypically powered by n-gram language models for latency reasons. We propose to train a recurrent neural network language model using thedecentralized FederatedAveraging algorithm and to approximate this federatedmodel server-side with an n-gram model that can be deployed to devices for fastinference. Our technical contributions include ways of handling largevocabularies, algorithms to correct capitalization errors in user data, andefficient finite state transducer algorithms to convert word language models toword-piece language models and vice versa. The n-gram language models trainedwith federated learning are compared to n-grams trained with traditionalserver-based algorithms using A/B tests on tens of millions of users of virtualkeyboard. Results are presented for two languages, American English andBrazilian Portuguese. This work demonstrates that high-quality n-gram languagemodels can be trained directly on client mobile devices without sensitivetraining data ever leaving the devices.

Quick Read (beta)

loading the full paper ...