Analyzing Privacy Loss in Updates of Natural Language Models

Abstract

To continuously improve quality and reflect changes in data, machinelearning-based services have to regularly re-train and update their coremodels. In the setting of language models, we show that a comparative analysisof model snapshots before and after an update can reveal a surprising amount ofdetailed information about the changes in the data used for training before andafter the update. We discuss the privacy implications of our findings, proposemitigation strategies and evaluate their effect.

Quick Read (beta)

loading the full paper ...