Investigating Cross-Domain Losses for Speech Enhancement

Abstract

Recent years have seen a surge in the number of available frameworks forspeech enhancement (SE) and recognition. Whether model-based or constructed viadeep learning, these frameworks often rely in isolation on either time-domainsignals or time-frequency (TF) representations of speech data. In this study,we investigate the advantages of each set of approaches by separately examiningtheir impact on speech intelligibility and quality. Furthermore, we combine thefragmented benefits of time-domain and TF speech representations by introducingtwo new cross-domain SE frameworks. A quantitative comparative analysis againstrecent model-based and deep learning SE approaches is performed to illustratethe merit of the proposed frameworks.

Quick Read (beta)

loading the full paper ...