Cross-lingual and Multilingual Spoken Term Detection for Low-Resource Indian Languages

Abstract

Spoken Term Detection (STD) is the task of searching for words or phraseswithin audio, given either text or spoken input as a query. In this work, weuse state-of-the-art Hindi, Tamil and Telugu ASR systems cross-lingually forlexical Spoken Term Detection in ten low-resource Indian languages. Since nopublicly available dataset exists for Spoken Term Detection in these languages,we create a new dataset using a publicly available TTS dataset. We report astandard metric for STD, Mean Term Weighted Value (MTWV) and show that ASRsystems built in languages that are phonetically similar to the targetlanguages have higher accuracy, however, it is also possible to get high MTWVscores for dissimilar languages by using a relaxed phone matching algorithm. Wepropose a technique to bootstrap the Grapheme-to-Phoneme (g2p) mapping betweenall the languages under consideration using publicly available resources. Gainsare obtained when we combine the output of multiple ASR systems and when we uselanguage-specific Language Models. We show that it is possible to perform STDcross-lingually in a zero-shot manner without the need for anylanguage-specific speech data. We plan to make the STD dataset available forother researchers interested in cross-lingual STD.

Quick Read (beta)

loading the full paper ...