Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Abstract

Word vector representations enable machines to encode human language forspoken language understanding and processing. Confusion2vec, motivated fromhuman speech production and perception, is a word vector representation whichencodes ambiguities present in human spoken language in addition to semanticsand syntactic information. Confusion2vec provides a robust spoken languagerepresentation by considering inherent human language ambiguities. In thispaper, we propose a novel word vector space estimation by unsupervised learningon lattices output by an automatic speech recognition (ASR) system. We encodeeach word in confusion2vec vector space by its constituent subword charactern-grams. We show the subword encoding helps better represent the acousticperceptual ambiguities in human spoken language via information modeled onlattice structured ASR output. The usefulness of the proposed Confusion2vecrepresentation is evaluated using semantic, syntactic and acoustic analogy andword similarity tasks. We also show the benefits of subword modeling foracoustic ambiguity representation on the task of spoken language intentdetection. The results significantly outperform existing word vectorrepresentations when evaluated on erroneous ASR outputs. We demonstrate thatConfusion2vec subword modeling eliminates the need for retraining/adapting thenatural language understanding models on ASR transcripts.

Quick Read (beta)

loading the full paper ...