Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers

Abstract

Text classification has become widely used in various natural languageprocessing applications like sentiment analysis. Current applications often uselarge transformer-based language models to classify input texts. However, thereis a lack of systematic study on how much private information can be invertedwhen publishing models. In this paper, we formulate \emph{Text Revealer} -- thefirst model inversion attack for text reconstruction against textclassification with transformers. Our attacks faithfully reconstruct privatetexts included in training data with access to the target model. We leverage anexternal dataset and GPT-2 to generate the target domain-like fluent text, andthen perturb its hidden state optimally with the feedback from the targetmodel. Our extensive experiments demonstrate that our attacks are effective fordatasets with different text lengths and can reconstruct private texts withaccuracy.

Quick Read (beta)

loading the full paper ...