Abstract
Recently, the Mixture of Expert (MoE) architecture, such as LR-MoE, is oftenused to alleviate the impact of language confusion on the multilingual ASR(MASR) task. However, it still faces language confusion issues, especially inmismatched domain scenarios. In this paper, we decouple language confusion inLR-MoE into confusion in self-attention and router. To alleviate the languageconfusion in self-attention, based on LR-MoE, we propose to apply attention-MoEarchitecture for MASR. In our new architecture, MoE is utilized not only onfeed-forward network (FFN) but also on self-attention. In addition, to improvethe robustness of the LID-based router on language confusion, we propose expertpruning and router augmentation methods. Combining the above, we get theboosted language-routing MoE (BLR-MoE) architecture. We verify theeffectiveness of the proposed BLR-MoE in a 10,000-hour MASR dataset.