Sequence-based Multi-lingual Low Resource Speech Recognition

  • 2018-02-21 04:09:26
  • Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black
  • 6

Abstract

Techniques for multi-lingual and cross-lingual speech recognition can help inlow resource scenarios, to bootstrap systems and enable analysis of newlanguages and domains. End-to-end approaches, in particular sequence-basedtechniques, are attractive because of their simplicity and elegance. While itis possible to integrate traditional multi-lingual bottleneck featureextractors as front-ends, we show that end-to-end multi-lingual training ofsequence models is effective on context independent models trained usingConnectionist Temporal Classification (CTC) loss. We show that our modelimproves performance on Babel languages by over 6% absolute in terms ofword/phoneme error rate when compared to mono-lingual systems built in the samesetting for these languages. We also show that the trained model can be adaptedcross-lingually to an unseen language using just 25% of the target data. Weshow that training on multiple languages is important for very low resourcecross-lingual target scenarios, but not for multi-lingual testing scenarios.Here, it appears beneficial to include large well prepared datasets.

 

Quick Read (beta)

loading the full paper ...