Abstract
In this paper, we propose a feature reinforcement method under thesequence-to-sequence neural text-to-speech (TTS) synthesis framework. Theproposed method utilizes the multiple input encoder to take three levels oftext information, i.e., phoneme sequence, pre-trained word embedding, andgrammatical structure of sentences from parser as the input feature for theneural TTS system. The added word and sentence level information can be viewedas the feature based pre-training strategy, which clearly enhances the modelgeneralization ability. The proposed method not only improves the systemrobustness significantly but also improves the synthesized speech to nearrecording quality in our experiments for out-of-domain text.
Quick Read (beta)
loading the full paper ...