Making AI Forget You: Data Deletion in Machine Learning

  • 2019-07-11 06:19:51
  • Antonio Ginart, Melody Guan, Gregory Valiant, James Zou
  • 32

Abstract

Intense recent discussions have focused on how to provide individuals withcontrol over when their data can and cannot be used -- the EU's Right To BeForgotten regulation is an example of this effort. In this paper we initiate aframework studying what to do when it is no longer permissible to deploy modelsderivative from specific user data. In particular, we formulate the problem ofhow to efficiently delete individual data points from trained machine learningmodels. For many standard ML models, the only way to completely remove anindividual's data is to retrain the whole model from scratch on the remainingdata, which is often not computationally practical. We investigate algorithmicprinciples that enable efficient data deletion in ML. For the specific settingof k-means clustering, we propose two provably deletion efficient algorithmswhich achieve an average of over 100X improvement in deletion efficiency across6 datasets, while producing clusters of comparable statistical quality to acanonical k-means++ baseline.

 

Quick Read (beta)

loading the full paper ...