Optimizing Canaries for Privacy Auditing with Metagradient Descent

Abstract

In this work we study black-box privacy auditing, where the goal is to lowerbound the privacy parameter of a differentially private learning algorithmusing only the algorithm's outputs (i.e., final trained model). For DP-SGD (themost successful method for training differentially private deep learningmodels), the canonical approach auditing uses membership inference-an auditorcomes with a small set of special "canary" examples, inserts a random subset ofthem into the training set, and then tries to discern which of their canarieswere included in the training set (typically via a membership inferenceattack). The auditor's success rate then provides a lower bound on the privacyparameters of the learning algorithm. Our main contribution is a method foroptimizing the auditor's canary set to improve privacy auditing, leveragingrecent work on metagradient optimization. Our empirical evaluation demonstratesthat by using such optimized canaries, we can improve empirical lower boundsfor differentially private image classification models by over 2x in certaininstances. Furthermore, we demonstrate that our method is transferable andefficient: canaries optimized for non-private SGD with a small modelarchitecture remain effective when auditing larger models trained with DP-SGD.

Quick Read (beta)

loading the full paper ...