Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

Abstract

The growing use of machine learning (ML) has raised concerns that an ML modelmay reveal private information about an individual who has contributed to thetraining dataset. To prevent leakage of sensitive data, we consider usingdifferentially-private (DP), synthetic training data instead of real trainingdata to train an ML model. A key desirable property of synthetic data is itsability to preserve the low-order marginals of the original distribution. Ourmain contribution comprises novel upper and lower bounds on the excessempirical risk of linear models trained on such synthetic data, for continuousand Lipschitz loss functions. We perform extensive experimentation alongsideour theoretical results.

Quick Read (beta)

loading the full paper ...