Can We Infer Confidential Properties of Training Data from LLMs?

Abstract

Large language models (LLMs) are increasingly fine-tuned on domain-specificdatasets to support applications in fields such as healthcare, finance, andlaw. These fine-tuning datasets often have sensitive and confidentialdataset-level properties -- such as patient demographics or disease prevalence-- that are not intended to be revealed. While prior work has studied propertyinference attacks on discriminative models (e.g., image classification models)and generative models (e.g., GANs for image data), it remains unclear if suchattacks transfer to LLMs. In this work, we introduce PropInfer, a benchmarktask for evaluating property inference in LLMs under two fine-tuning paradigms:question-answering and chat-completion. Built on the ChatDoctor dataset, ourbenchmark includes a range of property types and task configurations. Wefurther propose two tailored attacks: a prompt-based generation attack and ashadow-model attack leveraging word frequency signals. Empirical evaluationsacross multiple pretrained LLMs show the success of our attacks, revealing apreviously unrecognized vulnerability in LLMs.

Quick Read (beta)

loading the full paper ...