Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

Abstract

The performance of automatic speech recognition systems degrades withincreasing mismatch between the training and testing scenarios. Differences inspeaker accents are a significant source of such mismatch. The traditionalapproach to deal with multiple accents involves pooling data from severalaccents during training and building a single model in multi-task fashion,where tasks correspond to individual accents. In this paper, we explore analternate model where we jointly learn an accent classifier and a multi-taskacoustic model. Experiments on the American English Wall Street Journal andBritish English Cambridge corpora demonstrate that our joint model outperformsthe strong multi-task acoustic model baseline. We obtain a 5.94% relativeimprovement in word error rate on British English, and 9.47% relativeimprovement on American English. This illustrates that jointly modeling withaccent information improves acoustic model performance.

Quick Read (beta)

loading the full paper ...