Language Bias in Self-Supervised Learning For Automatic Speech Recognition

Abstract

Self-supervised learning (SSL) is used in deep learning to train on largedatasets without the need for expensive labelling of the data. Recently, largeAutomatic Speech Recognition (ASR) models such as XLS-R have utilised SSL totrain on over one hundred different languages simultaneously. However, deeperinvestigation shows that the bulk of the training data for XLS-R comes from asmall number of languages. Biases learned through SSL have been shown to existin multiple domains, but language bias in multilingual SSL ASR has not beenthoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis(LTH) to identify language-specific subnetworks within XLS-R and test theperformance of these subnetworks on a variety of different languages. We areable to show that when fine-tuning, XLS-R bypasses traditional linguisticknowledge and builds only on weights learned from the languages with thelargest data contribution to the pretraining data.

Quick Read (beta)

loading the full paper ...