Fine-grained Language Identification with Multilingual CapsNet Model

  • 2020-07-12 20:01:22
  • Mudit Verma, Arun Balaji Buduru
  • 0

Abstract

Due to a drastic improvement in the quality of internet services worldwide,there is an explosion of multilingual content generation and consumption. Thisis especially prevalent in countries with large multilingual audience, who areincreasingly consuming media outside their linguistic familiarity/preference.Hence, there is an increasing need for real-time and fine-grained contentanalysis services, including language identification, content transcription,and analysis. Accurate and fine-grained spoken language detection is anessential first step for all the subsequent content analysis algorithms.Current techniques in spoken language detection may lack on one of thesefronts: accuracy, fine-grained detection, data requirements, manual effort indata collection \& pre-processing. Hence in this work, a real-time languagedetection approach to detect spoken language from 5 seconds' audio clips withan accuracy of 91.8\% is presented with exiguous data requirements and minimalpre-processing. Novel architectures for Capsule Networks is proposed whichoperates on spectrogram images of the provided audio snippets. We use previousapproaches based on Recurrent Neural Networks and iVectors to present theresults. Finally we show a ``Non-Class'' analysis to further stress on whyCapsNet architecture works for LID task.

 

Quick Read (beta)

loading the full paper ...