Spoken Language Intent Detection using Confusion2Vec

Abstract

Decoding speaker's intent is a crucial part of spoken language understanding(SLU). The presence of noise or errors in the text transcriptions, in real lifescenarios make the task more challenging. In this paper, we address the spokenlanguage intent detection under noisy conditions imposed by automatic speechrecognition (ASR) systems. We propose to employ confusion2vec word featurerepresentation to compensate for the errors made by ASR and to increase therobustness of the SLU system. The confusion2vec, motivated from human speechproduction and perception, models acoustic relationships between words inaddition to the semantic and syntactic relations of words in human language. Wehypothesize that ASR often makes errors relating to acoustically similar words,and the confusion2vec with inherent model of acoustic relationships betweenwords is able to compensate for the errors. We demonstrate through experimentson the ATIS benchmark dataset, the robustness of the proposed model to achievestate-of-the-art results under noisy ASR conditions. Our system reducesclassification error rate (CER) by 20.84% and improves robustness by 37.48%(lower CER degradation) relative to the previous state-of-the-art going fromclean to noisy transcripts. Improvements are also demonstrated when trainingthe intent detection models on noisy transcripts.

Quick Read (beta)

loading the full paper ...