Multilingual Speech Recognition With A Single End-To-End Model

  • 2018-02-15 08:59:27
  • Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao
  • 0

Abstract

Training a conventional automatic speech recognition (ASR) system to supportmultiple languages is challenging because the sub-word unit, lexicon and wordinventories are typically language specific. In contrast, sequence-to-sequencemodels are well suited for multilingual ASR because they encapsulate anacoustic, pronunciation and language model jointly in a single network. In thiswork we present a single sequence-to-sequence ASR model trained on 9 differentIndian languages, which have very little overlap in their scripts.Specifically, we take a union of language-specific grapheme sets and train agrapheme-based sequence-to-sequence model jointly on data from all languages.We find that this model, which is not explicitly given any information aboutlanguage identity, improves recognition performance by 21% relative compared toanalogous sequence-to-sequence models trained on each language individually. Bymodifying the model to accept a language identifier as an additional inputfeature, we further improve performance by an additional 7% relative andeliminate confusion between different languages.

 

Quick Read (beta)

loading the full paper ...