Using recurrences in time and frequency within U-net architecture for speech enhancement

Abstract

When designing fully-convolutional neural network, there is a trade-offbetween receptive field size, number of parameters and spatial resolution offeatures in deeper layers of the network. In this work we present a novelnetwork design based on combination of many convolutional and recurrent layersthat solves these dilemmas. We compare our solution with U-nets based modelsknown from the literature and other baseline models on speech enhancement task.We test our solution on TIMIT speech utterances combined with noise segmentsextracted from NOISEX-92 database and show clear advantage of proposed solutionin terms of SDR (signal-to-distortion ratio), SIR (signal-to-interferenceratio) and STOI (spectro-temporal objective intelligibility) metrics comparedto the current state-of-the-art.

Quick Read (beta)

loading the full paper ...