Long-span language modeling for speech recognition

Abstract

We explore neural language modeling for speech recognition where the contextspans multiple sentences. Rather than encode history beyond the currentsentence using a cache of words or document-level features, we focus our studyon the ability of LSTM and Transformer language models to implicitly learn tocarry over context across sentence boundaries. We introduce a new architecturethat incorporates an attention mechanism into LSTM to combine the benefits ofrecurrent and attention architectures. We conduct language modeling and speechrecognition experiments on the publicly available LibriSpeech corpus. We showthat conventional training on a paragraph-level corpus results in significantreductions in perplexity compared to training on a sentence-level corpus. Wealso describe speech recognition experiments using long-span language models insecond-pass re-ranking, and provide insights into the ability of such models totake advantage of context beyond the current sentence.

Quick Read (beta)

loading the full paper ...