Abstract
We introduce a novel family of adversarial attacks that exploit the inabilityof language models to interpret ASCII art. To evaluate these attacks, wepropose the ToxASCII benchmark and develop two custom ASCII art fonts: oneleveraging special tokens and another using text-filled letter shapes. Ourattacks achieve a perfect 1.0 Attack Success Rate across ten models, includingOpenAI's o1-preview and LLaMA 3.1. Warning: this paper contains examples of toxic language used for researchpurposes.
Quick Read (beta)
loading the full paper ...