SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling

Abstract

Retrieval-Augmented Generation (RAG) and its Multimodal Retrieval-AugmentedGeneration (MRAG) significantly improve the knowledge coverage and contextualunderstanding of Large Language Models (LLMs) by introducing external knowledgesources. However, retrieval and multimodal fusion obscure content provenance,rendering existing membership inference methods unable to reliably attributegenerated outputs to pre-training, external retrieval, or user input, thusundermining privacy leakage accountability To address these challenges, we propose the first Source-aware MembershipAudit (SMA) that enables fine-grained source attribution of generated contentin a semi-black-box setting with retrieval control capabilities.To address theenvironmental constraints of semi-black-box auditing, we further design anattribution estimation mechanism based on zero-order optimization, whichrobustly approximates the true influence of input tokens on the output throughlarge-scale perturbation sampling and ridge regression modeling. In addition,SMA introduces a cross-modal attribution technique that projects image inputsinto textual descriptions via MLLMs, enabling token-level attribution in thetext modality, which for the first time facilitates membership inference onimage retrieval traces in MRAG systems. This work shifts the focus ofmembership inference from 'whether the data has been memorized' to 'where thecontent is sourced from', offering a novel perspective for auditing dataprovenance in complex generative systems.

Quick Read (beta)

loading the full paper ...