Parametric Resynthesis with neural vocoders

Abstract

Noise suppression systems generally produce output speech with compromisedquality. We propose to utilize the high quality speech generation capability ofneural vocoders for noise suppression. We use a neural network to predict cleanmel-spectrogram features from noisy speech and then compare two neuralvocoders, WaveNet and WaveGlow, for synthesizing clean speech from thepredicted mel spectrogram. Both WaveNet and WaveGlow achieve better subjectiveand objective quality scores than the source separation model Chimera++.Further, WaveNet and WaveGlow also achieve significantly better subjectivequality ratings than the oracle Wiener mask. Moreover, we observe that betweenWaveNet and WaveGlow, WaveNet achieves the best subjective quality scores,although at the cost of much slower waveform generation.

Quick Read (beta)

loading the full paper ...