Adversarial Semantic Collisions

  • 2020-11-09 20:42:01
  • Congzheng Song, Alexander M. Rush, Vitaly Shmatikov
  • 27

Abstract

We study semantic collisions: texts that are semantically unrelated butjudged as similar by NLP models. We develop gradient-based approaches forgenerating semantic collisions and demonstrate that state-of-the-art models formany tasks which rely on analyzing the meaning and similarity of texts--including paraphrase identification, document retrieval, response suggestion,and extractive summarization-- are vulnerable to semantic collisions. Forexample, given a target query, inserting a crafted collision into an irrelevantdocument can shift its retrieval rank from 1000 to top 3. We show how togenerate semantic collisions that evade perplexity-based filtering and discussother potential mitigations. Our code is available athttps://github.com/csong27/collision-bert.

 

Quick Read (beta)

loading the full paper ...