SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Abstract

In the last year, new models and methods for pretraining and transferlearning have driven striking performance improvements across a range oflanguage understanding tasks. The GLUE benchmark, introduced a little over oneyear ago, offers a single-number metric that summarizes progress on a diverseset of such tasks, but performance on the benchmark has recently surpassed thelevel of non-expert humans, suggesting limited headroom for further research.In this paper we present SuperGLUE, a new benchmark styled after GLUE with anew set of more difficult language understanding tasks, a software toolkit, anda public leaderboard. SuperGLUE is available at super.gluebenchmark.com.

Quick Read (beta)

loading the full paper ...