Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

Abstract

Large-scale pre-trained language models have achieved tremendous successacross a wide range of natural language understanding (NLU) tasks, evensurpassing human performance. However, recent studies reveal that therobustness of these models can be challenged by carefully crafted textualadversarial examples. While several individual datasets have been proposed toevaluate model robustness, a principled and comprehensive benchmark is stillmissing. In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-taskbenchmark to quantitatively and thoroughly explore and evaluate thevulnerabilities of modern large-scale language models under various types ofadversarial attacks. In particular, we systematically apply 14 textualadversarial attack methods to GLUE tasks to construct AdvGLUE, which is furthervalidated by humans for reliable annotations. Our findings are summarized asfollows. (i) Most existing adversarial attack algorithms are prone togenerating invalid or ambiguous adversarial examples, with around 90% of themeither changing the original semantic meanings or misleading human annotatorsas well. Therefore, we perform a careful filtering process to curate ahigh-quality benchmark. (ii) All the language models and robust trainingmethods we tested perform poorly on AdvGLUE, with scores lagging far behind thebenign accuracy. We hope our work will motivate the development of newadversarial attacks that are more stealthy and semantic-preserving, as well asnew robust language models against sophisticated adversarial attacks. AdvGLUEis available at https://adversarialglue.github.io.

Quick Read (beta)

loading the full paper ...