CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

  • 2022-06-14 08:19:35
  • Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu, Kun Zhou, Xuancheng Huang, Wenhao Li, Shuhuai Ren, Jinliang Lu, Chengqiang Xu, Huadong Wang, Guoyang Zeng, Zile Zhou, Jiajun Zhang, Juanzi Li, Minlie Huang, Rui Yan, Xiaodong He, Xiaojun Wan, Xin Zhao, Xu Sun, Yang Liu, Zhiyuan Liu, Xianpei Han, Erhong Yang, Zhifang Sui, Maosong Sun
  • 0

Abstract

Realizing general-purpose language intelligence has been a longstanding goalfor natural language processing, where standard evaluation benchmarks play afundamental and guiding role. We argue that for general-purpose languageintelligence evaluation, the benchmark itself needs to be comprehensive andsystematic. To this end, we propose CUGE, a Chinese Language Understanding andGeneration Evaluation benchmark with the following features: (1) Hierarchicalbenchmark framework, where datasets are principally selected and organized witha language capability-task-dataset hierarchy. (2) Multi-level scoring strategy,where different levels of model performance are provided based on thehierarchical framework. To facilitate CUGE, we provide a public leaderboardthat can be customized to support flexible model judging criteria. Evaluationresults on representative pre-trained language models indicate ample room forimprovement towards general-purpose language intelligence. CUGE is publiclyavailable at cuge.baai.ac.cn.

 

Quick Read (beta)

loading the full paper ...