Natural Backdoor Attack on Text Data

Abstract

Deep learning has been widely adopted in natural language processingapplications in recent years. Many existing studies show the vulnerabilities ofmachine learning and deep learning models against adversarial examples.However, most existing works currently focus on evasion attack on text datainstead of positioning attack, also named \textit{backdoor attack}. In thispaper, we systematically study the backdoor attack against models on text data.First, we define the backdoor attack on text data. Then, we propose thedifferent attack strategies to generate trigger on text data. Next, we proposedifferent types of triggers based on modification scope, human recognition andspecial cases. Last, we evaluate the backdoor attack and the results show theexcellent performance of with 100\% backdoor attack rate and sacrificing of0.71\% on text classification text.

Quick Read (beta)

loading the full paper ...