WaveGrad: Estimating Gradients for Waveform Generation

Abstract

This paper introduces WaveGrad, a conditional model for waveform generationthrough estimating gradients of the data density. This model is built on theprior work on score matching and diffusion probabilistic models. It starts fromGaussian white noise and iteratively refines the signal via a gradient-basedsampler conditioned on the mel-spectrogram. WaveGrad is non-autoregressive, andrequires only a constant number of generation steps during inference. It canuse as few as 6 iterations to generate high fidelity audio samples. WaveGrad issimple to train, and implicitly optimizes for the weighted variationallower-bound of the log-likelihood. Empirical experiments reveal WaveGrad togenerate high fidelity audio samples matching a strong likelihood-basedautoregressive baseline with less sequential operations.

Quick Read (beta)

loading the full paper ...