PrOnto: Language Model Evaluations for 859 Languages

Abstract

Evaluation datasets are critical resources for measuring the quality ofpretrained language models. However, due to the high cost of datasetannotation, these resources are scarce for most languages other than English,making it difficult to assess the quality of language models. In this work, wepresent a new method for evaluation dataset construction which enables anylanguage with a New Testament translation to receive a suite of evaluationdatasets suitable for pretrained language model evaluation. The methodcritically involves aligning verses with those in the New Testament portion ofEnglish OntoNotes, and then projecting annotations from English to the targetlanguage, with no manual annotation required. We apply this method to 1051 NewTestament translations in 859 and make them publicly available. Additionally,we conduct experiments which demonstrate the efficacy of our method forcreating evaluation tasks which can assess language model quality.

Quick Read (beta)

loading the full paper ...