Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

Abstract

Models pre-trained on multiple languages have shown significant promise forimproving speech recognition, particularly for low-resource languages. In thiswork, we focus on phoneme recognition using Allosaurus, a method formultilingual recognition based on phonetic annotation, which incorporatesphonological knowledge through a language-dependent allophone layer thatassociates a universal narrow phone-set with the phonemes that appear in eachlanguage. To evaluate in a challenging real-world scenario, we curate phonerecognition datasets for Bukusu and Saamia, two varieties of the Luhya languagecluster of western Kenya and eastern Uganda. To our knowledge, these datasetsare the first of their kind. We carry out similar experiments on the dataset ofan endangered Tangkhulic language, East Tusom, a Tibeto-Burman language varietyspoken mostly in India. We explore both zero-shot and few-shot recognition byfine-tuning using datasets of varying sizes (10 to 1000 utterances). We findthat fine-tuning of Allosaurus, even with just 100 utterances, leads tosignificant improvements in phone error rates.

Quick Read (beta)

loading the full paper ...