Dataset-Level Attribute Leakage in Collaborative Learning

Abstract

Secure multi-party machine learning allows several parties to build a modelon their pooled data to increase utility while not explicitly sharing data witheach other. We show that such multi-party computation can cause leakage ofglobal dataset properties between the parties even when parties obtain onlyblack-box access to the final model. In particular, a "curious" party can inferthe distribution of sensitive attributes in other parties' data with highaccuracy. This raises concerns regarding the confidentiality of propertiespertaining to the whole dataset as opposed to individual data records. We showthat our attack can leak population-level properties in datasets of differenttypes, including tabular, text, and graph data. To understand and measure thesource of leakage, we consider several models of correlation between asensitive attribute and the rest of the data. Using multiple machine learningmodels, we show that leakage occurs even if the sensitive attribute is notincluded in the training data and has a low correlation with other attributesand the target variable.

Quick Read (beta)

loading the full paper ...