The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets

Abstract

Machine learning models based on neural networks and deep learning are beingrapidly adopted for many purposes. What those models learn, and what they mayshare, is a significant concern when the training data may contain secrets andthe models are public -- e.g., when a model helps users compose text messagesusing models trained on all users' messages. This paper presents exposure: a simple-to-compute metric that can be appliedto any deep learning model for measuring the memorization of secrets. Usingthis metric, we show how to extract those secrets efficiently using black-boxAPI access. Further, we show that unintended memorization occurs early, is notdue to over-fitting, and is a persistent issue across different types ofmodels, hyperparameters, and training strategies. We experiment with bothreal-world models (e.g., a state-of-the-art translation model) and datasets(e.g., the Enron email dataset, which contains users' credit card numbers) todemonstrate both the utility of measuring exposure and the ability to extractsecrets. Finally, we consider many defenses, finding some ineffective (likeregularization), and others to lack guarantees. However, by instantiating ourown differentially-private recurrent model, we validate that by appropriatelyinvesting in the use of state-of-the-art techniques, the problem can beresolved, with high utility.

Quick Read (beta)

loading the full paper ...