Abstract
Machine learning models can leak private information about their trainingdata, but the standard methods to measure this risk, based on membershipinference attacks (MIAs), have a major limitation. They only check if a givendata point \textit{exactly} matches a training point, neglecting the potentialof similar or partially overlapping data revealing the same privateinformation. To address this issue, we introduce the class of range membershipinference attacks (RaMIAs), testing if the model was trained on any data in aspecified range (defined based on the semantics of privacy). We formulate theRaMIAs game and design a principled statistical test for its complexhypotheses. We show that RaMIAs can capture privacy loss more accurately andcomprehensively than MIAs on various types of data, such as tabular, image, andlanguage. RaMIA paves the way for a more comprehensive and meaningful privacyauditing of machine learning algorithms.