Privacy Evaluation Benchmarks for NLP Models

Abstract

By inducing privacy attacks on NLP models, attackers can obtain sensitiveinformation such as training data and model parameters, etc. Althoughresearchers have studied, in-depth, several kinds of attacks in NLP models,they are non-systematic analyses. It lacks a comprehensive understanding of theimpact caused by the attacks. For example, we must consider which scenarios canapply to which attacks, what the common factors are that affect the performanceof different attacks, the nature of the relationships between differentattacks, and the influence of various datasets and models on the effectivenessof the attacks, etc. Therefore, we need a benchmark to holistically assess theprivacy risks faced by NLP models. In this paper, we present a privacy attackand defense evaluation benchmark in the field of NLP, which includes theconventional/small models and large language models (LLMs). This benchmarksupports a variety of models, datasets, and protocols, along with standardizedmodules for comprehensive evaluation of attacks and defense strategies. Basedon the above framework, we present a study on the association between auxiliarydata from different domains and the strength of privacy attacks. And we providean improved attack method in this scenario with the help of KnowledgeDistillation (KD). Furthermore, we propose a chained framework for privacyattacks. Allowing a practitioner to chain multiple attacks to achieve ahigher-level attack objective. Based on this, we provide some defense andenhanced attack strategies. The code for reproducing the results can be foundat https://github.com/user2311717757/nlp_doctor.

Quick Read (beta)

loading the full paper ...