A Configurable Multilingual Model is All You Need to Recognize All Languages

  • 2021-07-13 06:52:41
  • Long Zhou, Jinyu Li, Eric Sun, Shujie Liu
  • 3


Multilingual automatic speech recognition (ASR) models have shown greatpromise in recent years because of the simplified model training and deploymentprocess. Conventional methods either train a universal multilingual modelwithout taking any language information or with a 1-hot language ID (LID)vector to guide the recognition of the target language. In practice, the usercan be prompted to pre-select several languages he/she can speak. Themultilingual model without LID cannot well utilize the language information setby the user while the multilingual model with LID can only handle onepre-selected language. In this paper, we propose a novel configurablemultilingual model (CMM) which is trained only once but can be configured asdifferent models based on users' choices by extracting language-specificmodules together with a universal model from the trained CMM. Particularly, asingle CMM can be deployed to any user scenario where the users can pre-selectany combination of languages. Trained with 75K hours of transcribed anonymizedMicrosoft multilingual data and evaluated with 10-language test sets, theproposed CMM improves from the universal multilingual model by 26.0%, 16.9%,and 10.4% relative word error reduction when the user selects 1, 2, or 3languages, respectively. CMM also performs significantly better oncode-switching test sets.


