Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

  • 2018-02-07 22:05:18
  • Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson
  • 25


The performance of automatic speech recognition systems degrades withincreasing mismatch between the training and testing scenarios. Differences inspeaker accents are a significant source of such mismatch. The traditionalapproach to deal with multiple accents involves pooling data from severalaccents during training and building a single model in multi-task fashion,where tasks correspond to individual accents. In this paper, we explore analternate model where we jointly learn an accent classifier and a multi-taskacoustic model. Experiments on the American English Wall Street Journal andBritish English Cambridge corpora demonstrate that our joint model outperformsthe strong multi-task acoustic model baseline. We obtain a 5.94% relativeimprovement in word error rate on British English, and 9.47% relativeimprovement on American English. This illustrates that jointly modeling withaccent information improves acoustic model performance.


Introduction (beta)



Conclusion (beta)