Data Leakage in Federated Averaging

Abstract

Recent attacks have shown that user data can be reconstructed from FedSGDupdates, thus breaking privacy. However, these attacks are of limited practicalrelevance as federated learning typically uses the FedAvg algorithm. It isgenerally accepted that reconstructing data from FedAvg updates is much harderthan FedSGD as: (i) there are unobserved intermediate weight updates, (ii) theorder of inputs matters, and (iii) the order of labels changes every epoch. Inthis work, we propose a new optimization-based attack which successfullyattacks FedAvg by addressing the above challenges. First, we solve theoptimization problem using automatic differentiation that forces a simulationof the client's update for the reconstructed labels and inputs so as to matchthe received client update. Second, we address the unknown input order bytreating images at different epochs as independent during optimization, whilerelating them with a permutation invariant prior. Third, we reconstruct thelabels by estimating the parameters of existing FedSGD attacks at every FedAvgstep. On the popular FEMNIST dataset, we demonstrate that on average wesuccessfully reconstruct >45% of the client's images from realistic FedAvgupdates computed on 10 local epochs of 10 batches each with 5 images, comparedto only <10% using the baseline. These findings indicate that many real-worldfederated learning implementations based on FedAvg are vulnerable.

Quick Read (beta)

loading the full paper ...