Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Abstract

Word embeddings, i.e., low-dimensional vector representations such as GloVeand SGNS, encode word "meaning" in the sense that distances between words'vectors correspond to their semantic proximity. This enables transfer learningof semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such asWikipedia or Twitter. We demonstrate that an attacker who can modify the corpuson which the embedding is trained can control the "meaning" of new and existingwords by changing their locations in the embedding space. We develop anexplicit expression over corpus features that serves as a proxy for distancebetween words and establish a causative relationship between its values andembedding distances. We then show how to use this relationship for twoadversarial objectives: (1) make a word a top-ranked neighbor of another word,and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstratingfor the first time the power of data poisoning in transfer learning scenarios.We use this attack to manipulate query expansion in information retrievalsystems such as resume search, make certain names more or less visible to namedentity recognition models, and cause new words to be translated to a particulartarget word regardless of the language. Finally, we show how the attacker cangenerate linguistically likely corpus modifications, thus fooling defenses thatattempt to filter implausible sentences from the corpus using a language model.

Quick Read (beta)

loading the full paper ...