Learning pronunciation from a foreign language in speech synthesis networks

Abstract

Although there are more than 65,000 languages in the world, thepronunciations of many phonemes sound similar across the languages. When peoplelearn a foreign language, their pronunciation often reflect their nativelanguage's characteristics. That motivates us to investigate how the speechsynthesis network learns the pronunciation when multi-lingual dataset is given.In this study, we train the speech synthesis network bilingually in English andKorean, and analyze how the network learns the relations of phonemepronunciation between the languages. Our experimental result shows that thelearned phoneme embedding vectors are located closer if their pronunciationsare similar across the languages. Based on the result, we also show that it ispossible to train networks that synthesize English speaker's Korean speech andvice versa. In another experiment, we train the network with limited amount ofEnglish dataset and large Korean dataset, and analyze the required amount ofdataset to train a resource-poor language with the help of resource-richlanguages.

Quick Read (beta)

loading the full paper ...