Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Abstract

Visual modifications to text are often used to obfuscate offensive commentsin social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"),among other scenarios. We consider this as a new type of adversarial attack inNLP, a setting to which humans are very robust, as our experiments with bothsimple and more difficult visual input perturbations demonstrate. We theninvestigate the impact of visual adversarial attacks on current NLP systems oncharacter-, word-, and sentence-level tasks, showing that both neural andnon-neural models are, in contrast to humans, extremely sensitive to suchattacks, suffering performance decreases of up to 82\%. We then explore threeshielding methods---visual character embeddings, adversarial training, andrule-based recovery---which substantially improve the robustness of the models.However, the shielding methods still fall behind performances achieved innon-attack scenarios, which demonstrates the difficulty of dealing with visualattacks.

Quick Read (beta)

loading the full paper ...