Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

  • 2021-04-24 16:41:17
  • Roman Bedyakin, Nikolay Mikhaylovskiy
  • 1


This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task onpredicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingualAutomated Speech Recognition (ASR) system pipeline. For many low-resource andendangered languages, only single-speaker recordings may be available,demanding a need for domain and speaker-invariant language ID systems. In thismemo, we show that a convolutional neural network with a Self-Attentive Poolinglayer shows promising results for the language identification task.


Quick Read (beta)

loading the full paper ...