Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

Abstract

Traditional speech enhancement systems produce speech with compromisedquality. Here we propose to use the high quality speech generation capabilityof neural vocoders for better quality speech enhancement. We term thisparametric resynthesis (PR). In previous work, we showed that PR systemsgenerate high quality speech for a single speaker using two neural vocoders,WaveNet and WaveGlow. Both these vocoders are traditionally speaker dependent.Here we first show that when trained on data from enough speakers, thesevocoders can generate speech from unseen speakers, both male and female, withsimilar quality as seen speakers in training. Next using these two vocoders anda new vocoder LPCNet, we evaluate the noise reduction quality of PR on unseenspeakers and show that objective signal and overall quality is higher than thestate-of-the-art speech enhancement systems Wave-U-Net, Wavenet-denoise, andSEGAN. Moreover, in subjective quality, multiple-speaker PR out-performs theoracle Wiener mask.

Quick Read (beta)

loading the full paper ...