Evaluating Transferability of BERT Models on Uralic Languages

Abstract

Transformer-based language models such as BERT have outperformed previousmodels on a large number of English benchmarks, but their evaluation is oftenlimited to English or a small number of well-resourced languages. In this work,we evaluate monolingual, multilingual, and randomly initialized language modelsfrom the BERT family on a variety of Uralic languages including Estonian,Finnish, Hungarian, Erzya, Moksha, Karelian, Livvi, Komi Permyak, Komi Zyrian,Northern S\'ami, and Skolt S\'ami. When monolingual models are available(currently only et, fi, hu), these perform better on their native language, butin general they transfer worse than multilingual models or models ofgenetically unrelated languages that share the same character set. Remarkably,straightforward transfer of high-resource models, even without special effortstoward hyperparameter optimization, yields what appear to be state of the artPOS and NER tools for the minority Uralic languages where there is sufficientdata for finetuning.

Quick Read (beta)

loading the full paper ...