Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Abstract

Automatic speech recognition (ASR) systems degrade significantly under noisyconditions. Recently, speech enhancement (SE) is introduced as front-end toreduce noise for ASR, but it also suppresses some important speech information,i.e., over-suppression. To alleviate this, we propose a dual-path stylelearning approach for end-to-end noise-robust speech recognition (DPSL-ASR).Specifically, we first introduce clean speech feature along with the fusedfeature from IFF-Net as dual-path inputs to recover the suppressed information.Then, we propose style learning to map the fused feature close to cleanfeature, in order to learn latent speech information from the latter, i.e.,clean "speech style". Furthermore, we also minimize the distance of final ASRoutputs in two paths to improve noise-robustness. Experiments show that theproposed approach achieves relative word error rate (WER) reductions of 10.6%and 8.6% over the best IFF-Net baseline, on RATS and CHiME-4 datasetsrespectively.

Quick Read (beta)

loading the full paper ...