WaveGlow: A Flow-based Generative Network for Speech Synthesis

  • 2018-10-31 03:22:25
  • Ryan Prenger, Rafael Valle, Bryan Catanzaro
  • 30

Abstract

In this paper we propose WaveGlow: a flow-based network capable of generatinghigh quality speech from mel-spectrograms. WaveGlow combines insights from Glowand WaveNet in order to provide fast, efficient and high-quality audiosynthesis, without the need for auto-regression. WaveGlow is implemented usingonly a single network, trained using only a single cost function: maximizingthe likelihood of the training data, which makes the training procedure simpleand stable. Our PyTorch implementation produces audio samples at a rate of morethan 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it deliversaudio quality as good as the best publicly available WaveNet implementation.All code will be made publicly available online.

 

Introduction (beta)

None

 

Conclusion (beta)

None