Discriminative training of RNNLMs with the average word error criterion

Abstract

In automatic speech recognition (ASR), recurrent neural language models(RNNLM) are typically used to refine hypotheses in the form of lattices orn-best lists, which are generated by a beam search decoder with a weakerlanguage model. The RNNLMs are usually trained generatively using theperplexity (PPL) criterion on large corpora of grammatically correct text.However, the hypotheses are noisy, and the RNNLM doesn't always make thechoices that minimise the metric we optimise for, the word error rate (WER). Toaddress this mismatch we propose to use a task specific loss to train an RNNLMto discriminate between multiple hypotheses within lattice rescoring scenario.By fine-tuning the RNNLM on lattices with the average edit distance loss, weshow that we obtain a 1.9% relative improvement in word error rate over apurely generatively trained model.

Quick Read (beta)

loading the full paper ...