Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification

Abstract

Spoken language Identification (LID) systems are needed to identify thelanguage(s) present in a given audio sample, and typically could be the firststep in many speech processing related tasks such as automatic speechrecognition (ASR). Automatic identification of the languages present in aspeech signal is not only scientifically interesting, but also of practicalimportance in a multilingual country such as India. In many of the Indiancities, when people interact with each other, as many as three languages mayget mixed. These may include the official language of that province, Hindi andEnglish (at times the languages of the neighboring provinces may also get mixedduring these interactions). This makes the spoken LID task extremelychallenging in Indian context. While quite a few LID systems in the context ofIndian languages have been implemented, most such systems have used small scalespeech data collected internally within an organization. In the current work,we perform spoken LID on three Indian languages (Gujarati, Telugu, and Tamil)code-mixed with English. This task was organized by the Microsoft research teamas a spoken LID challenge. In our work, we modify the usual spectralaugmentation approach and propose a language mask that discriminates thelanguage ID pairs, which leads to a noise robust spoken LID system. Theproposed method gives a relative improvement of approximately 3-5% in the LIDaccuracy over a baseline system proposed by Microsoft on the three languagepairs for two shared tasks suggested in the challenge.

Quick Read (beta)

loading the full paper ...