ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Abstract

Nowadays, pretrained language models (PLMs) have dominated the majority ofNLP tasks. While, little research has been conducted on systematicallyevaluating the language abilities of PLMs. In this paper, we present alarge-scale empirical study on general language ability evaluation of PLMs(ElitePLM). In our study, we design four evaluation dimensions, i.e. memory,comprehension, reasoning, and composition, to measure ten widely-used PLMswithin five categories. Our empirical results demonstrate that: (1) PLMs withvarying training objectives and strategies are good at different ability tests;(2) fine-tuning PLMs in downstream tasks is usually sensitive to the data sizeand distribution; (3) PLMs have excellent transferability between similartasks. Moreover, the prediction results of PLMs in our experiments are releasedas an open resource for more deep and detailed analysis on the languageabilities of PLMs. This paper can guide the future work to select, apply, anddesign PLMs for specific tasks. We have made all the details of experimentspublicly available at https://github.com/RUCAIBox/ElitePLM.

Quick Read (beta)

loading the full paper ...