HotFlip: White-Box Adversarial Examples for Text Classification

  • 2018-05-24 16:43:45
  • Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou
  • 0

Abstract

We propose an efficient method to generate white-box adversarial examples totrick a character-level neural classifier. We find that only a fewmanipulations are needed to greatly decrease the accuracy. Our method relies onan atomic flip operation, which swaps one token for another, based on thegradients of the one-hot input vectors. Due to efficiency of our method, we canperform adversarial training which makes the model more robust to attacks attest time. With the use of a few semantics-preserving constraints, wedemonstrate that HotFlip can be adapted to attack a word-level classifier aswell.

 

Quick Read (beta)

loading the full paper ...