Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

  • 2021-11-29 23:14:54
  • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu
  • 27


Conversational bilingual speech encompasses three types of utterances: twopurely monolingual types and one intra-sententially code-switched type. In thiswork, we propose a general framework to jointly model the likelihoods of themonolingual and code-switch sub-tasks that comprise bilingual speechrecognition. By defining the monolingual sub-tasks with label-to-framesynchronization, our joint modeling framework can be conditionally factorizedsuch that the final bilingual output, which may or may not be code-switched, isobtained given only monolingual information. We show that this conditionallyfactorized joint framework can be modeled by an end-to-end differentiableneural network. We demonstrate the efficacy of our proposed model on bilingualMandarin-English speech recognition across both monolingual and code-switchedcorpora.


