Adversarial Semantic Collisions

Abstract

We study semantic collisions: texts that are semantically unrelated butjudged as similar by NLP models. We develop gradient-based approaches forgenerating semantic collisions and demonstrate that state-of-the-art models formany tasks which rely on analyzing the meaning and similarity of texts--including paraphrase identification, document retrieval, response suggestion,and extractive summarization-- are vulnerable to semantic collisions. Forexample, given a target query, inserting a crafted collision into an irrelevantdocument can shift its retrieval rank from 1000 to top 3. We show how togenerate semantic collisions that evade perplexity-based filtering and discussother potential mitigations. Our code is available athttps://github.com/csong27/collision-bert.

Quick Read (beta)

loading the full paper ...