End-to-End Code-Switching ASR for Low-Resourced Language Pairs

Abstract

Despite the significant progress in end-to-end (E2E) automatic speechrecognition (ASR), E2E ASR for low resourced code-switching (CS) speech has notbeen well studied. In this work, we describe an E2E ASR pipeline for therecognition of CS speech in which a low-resourced language is mixed with a highresourced language. Low-resourcedness in acoustic data hinders the performanceof E2E ASR systems more severely than the conventional ASR systems. To mitigatethis problem in the transcription of archives with code-switching Frisian-Dutchspeech, we integrate a designated decoding scheme and perform rescoring withneural network-based language models to enable better utilization of theavailable textual resources. We first incorporate a multi-graph decodingapproach which creates parallel search spaces for each monolingual and mixedrecognition tasks to maximize the utilization of the textual resources fromeach language. Further, language model rescoring is performed using a recurrentneural network pre-trained with cross-lingual embedding and further adaptedwith the limited amount of in-domain CS text. The ASR experiments demonstratethe effectiveness of the described techniques in improving the recognitionperformance of an E2E CS ASR system in a low-resourced scenario.

Quick Read (beta)

loading the full paper ...