Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages

Abstract

Building an automatic speech recognition (ASR) system from scratch requires alarge amount of annotated speech data, which is difficult to collect in manylanguages. However, there are cases where the low-resource language shares acommon acoustic space with a high-resource language having enough annotateddata to build an ASR. In such cases, we show that the domain-independentacoustic models learned from the high-resource language through unsuperviseddomain adaptation (UDA) schemes can enhance the performance of the ASR in thelow-resource language. We use the specific example of Hindi in the sourcedomain and Sanskrit in the target domain. We explore two architectures: i)domain adversarial training using gradient reversal layer (GRL) and ii) domainseparation networks (DSN). The GRL and DSN architectures give absoluteimprovements of 6.71% and 7.32%, respectively, in word error rate over thebaseline deep neural network model when trained on just 5.5 hours of data inthe target domain. We also show that choosing a proper language (Telugu) in thesource domain can bring further improvement. The results suggest that UDAschemes can be helpful in the development of ASR systems for low-resourcelanguages, mitigating the hassle of collecting large amounts of annotatedspeech data.

Quick Read (beta)

loading the full paper ...