Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance

Abstract

Automatic Speech Recognition (ASR) performance for low-resource languages isstill far behind that of higher-resource languages such as English, due to alack of sufficient labeled data. State-of-the-art methods deployself-supervised transfer learning where a model pre-trained on large amounts ofdata is fine-tuned using little labeled data in a target low-resource language.In this paper, we present and examine a method for fine-tuning an SSL-basedmodel in order to improve the performance for Frisian and its regional dialects(Clay Frisian, Wood Frisian, and South Frisian). We show that Frisian ASRperformance can be improved by using multilingual (Frisian, Dutch, English andGerman) fine-tuning data and an auxiliary language identification task. Inaddition, our findings show that performance on dialectal speech sufferssubstantially, and, importantly, that this effect is moderated by theelicitation approach used to collect the dialectal data. Our findings alsoparticularly suggest that relying solely on standard language data for ASRevaluation may underestimate real-world performance, particularly in languageswith substantial dialectal variation.

Quick Read (beta)

loading the full paper ...