A Data Efficient End-To-End Spoken Language Understanding Architecture

Abstract

End-to-end architectures have been recently proposed for spoken languageunderstanding (SLU) and semantic parsing. Based on a large amount of data,those models learn jointly acoustic and linguistic-sequential features. Sucharchitectures give very good results in the context of domain, intent and slotdetection, their application in a more complex semantic chunking and taggingtask is less easy. For that, in many cases, models are combined with anexternal language model to enhance their performance. In this paper we introduce a data efficient system which is trainedend-to-end, with no additional, pre-trained external module. One key feature ofour approach is an incremental training procedure where acoustic, language andsemantic models are trained sequentially one after the other. The proposedmodel has a reasonable size and achieves competitive results with respect tostate-of-the-art while using a small training dataset. In particular, we reach24.02% Concept Error Rate (CER) on MEDIA/test while training on MEDIA/trainwithout any additional data.

Quick Read (beta)

loading the full paper ...