Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

Abstract

Recently, language identity information has been utilized to improve theperformance of end-to-end code-switching (CS) speech recognition. However,previous works use an additional language identification (LID) model as anauxiliary module, which causes the system complex. In this work, we propose animproved recurrent neural network transducer (RNN-T) model with language biasto alleviate the problem. We use the language identities to bias the model topredict the CS points. This promotes the model to learn the language identityinformation directly from transcription, and no additional LID model is needed.We evaluate the approach on a Mandarin-English CS corpus SEAME. Compared to ourRNN-T baseline, the proposed method can achieve 16.2% and 12.9% relative errorreduction on two test sets, respectively.

Quick Read (beta)

loading the full paper ...