GLGE: A New General Language Generation Evaluation Benchmark

  • 2021-06-01 08:01:50
  • Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
  • 0

Abstract

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progressof pretraining and transfer learning in Natural Language Processing (NLP).These benchmarks mostly focus on a range of Natural Language Understanding(NLU) tasks, without considering the Natural Language Generation (NLG) models.In this paper, we present the General Language Generation Evaluation (GLGE), anew multi-task benchmark for evaluating the generalization capabilities of NLGmodels across eight language generation tasks. For each task, we continue todesign three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, andGLGE-Hard). This introduces 24 subtasks to comprehensively compare modelperformance. To encourage research on pretraining and transfer learning on NLGmodels, we make GLGE publicly available and build a leaderboard with strongbaselines including MASS, BART, and ProphetNet (The source code and dataset arepublicly available at https://github.com/microsoft/glge).

 

Quick Read (beta)

loading the full paper ...