Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

Abstract

With recent advancements in language technologies, humans are now speaking todevices. Increasing the reach of spoken language technologies requires buildingsystems in local languages. A major bottleneck here are the underlyingdata-intensive parts that make up such systems, including automatic speechrecognition (ASR) systems that require large amounts of labelled data. With theaim of aiding development of spoken dialog systems in low resourced languages,we propose a novel acoustics based intent recognition system that usesdiscovered phonetic units for intent classification. The system is made up oftwo blocks - the first block is a universal phone recognition system thatgenerates a transcript of discovered phonetic units for the input audio, andthe second block performs intent classification from the generated phonetictranscripts. We propose a CNN+LSTM based architecture and present results fortwo languages families - Indic languages and Romance languages, for twodifferent intent recognition tasks. We also perform multilingual training ofour intent classifier and show improved cross-lingual transfer and zero-shotperformance on an unknown language within the same language family.

Quick Read (beta)

loading the full paper ...