Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

  • 2020-07-06 18:43:38
  • Vineel Pratap, Anuroop Sriram, Paden Tomasello, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
  • 16

Abstract

We study training a single acoustic model for multiple languages with the aimof improving automatic speech recognition (ASR) performance on low-resourcelanguages, and over-all simplifying deployment of ASR systems that supportdiverse languages. We perform an extensive benchmark on 51 languages, withvarying amount of training data by language(from 100 hours to 1100 hours). Wecompare three variants of multilingual training from a single joint modelwithout knowing the input language, to using this information, to multipleheads (one per language cluster). We show that multilingual training of ASRmodels on several languages can improve recognition performance, in particular,on low resource languages. We see 20.9%, 23% and 28.8% average WER relativereduction compared to monolingual baselines on joint model, joint model withlanguage input and multi head model respectively. To our knowledge, this is thefirst work studying multilingual ASR at massive scale, with more than 50languages and more than 16,000 hours of audio across them.

 

Quick Read (beta)

loading the full paper ...