Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

  • 2020-06-25 06:16:29
  • Alex Sokolov, Tracy Rohlin, Ariya Rastrow
Grapheme-to-phoneme (G2P) models are a key component in Automatic SpeechRecognition (ASR) systems, such as the ASR system in Alexa, as they are used togenerate pronunciations for out-of-vocabulary words that do not exist in thepronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systemsare monolingual and based on traditional joint-sequence based n-gram models[1,2]. As an alternative, we present a single end-to-end trained neural G2Pmodel that shares same encoder and decoder across multiple languages. Thisallows the model to utilize a combination of universal symbol inventories ofLatin-like alphabets and cross-linguistically shared feature representations.Such model is especially useful in the scenarios of low resource languages andcode switching/foreign words, where the pronunciations in one language need tobe adapted to other locales or accents. We further experiment with wordlanguage distribution vector as an additional training target in order toimprove system performance by helping the model decouple pronunciations acrossa variety of languages in the parameter space. We show 7.2% average improvementin phoneme error rate over low resource languages and no degradation over highresource ones compared to monolingual baselines.


