Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

  • 2019-07-13 06:27:24
  • Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen
  • 4

Abstract

Integrating an external language model into a sequence-to-sequence speechrecognition system is non-trivial. Previous works utilize linear interpolationor a fusion network to integrate external language models. However, theseapproaches introduce external components, and increase decoding computation. Inthis paper, we instead propose a knowledge distillation based training approachto integrating external language models into a sequence-to-sequence model. Arecurrent neural network language model, which is trained on large scaleexternal text, generates soft labels to guide the sequence-to-sequence modeltraining. Thus, the language model plays the role of the teacher. This approachdoes not add any external component to the sequence-to-sequence model duringtesting. And this approach is flexible to be combined with shallow fusiontechnique together for decoding. The experiments are conducted on publicChinese datasets AISHELL-1 and CLMAD. Our approach achieves a character errorrate of 9.3%, which is relatively reduced by 18.42% compared with the vanillasequence-to-sequence model.

 

Quick Read (beta)

loading the full paper ...