SIGTYP 2021 Shared Task: Robust Spoken Language Identification

  • 2021-06-07 18:12:27
  • Elizabeth Salesky, Badr M. Abdullah, Sabrina J. Mielke, Elena Klyachko, Oleg Serikov, Edoardo Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova
  • 0

Abstract

While language identification is a fundamental speech and language processingtask, for many languages and language families it remains a challenging task.For many low-resource and endangered languages this is in part due to resourceavailability: where larger datasets exist, they may be single-speaker or havedifferent domains than desired application scenarios, demanding a need fordomain and speaker-invariant language identification systems. This year'sshared task on robust spoken language identification sought to investigate justthis scenario: systems were to be trained on largely single-speaker speech fromone domain, but evaluated on data in other domains recorded from speakers underdifferent recording circumstances, mimicking realistic low-resource scenarios.We see that domain and speaker mismatch proves very challenging for currentmethods which can perform above 95% accuracy in-domain, which domain adaptationcan address to some degree, but that these conditions merit furtherinvestigation to make spoken language identification accessible in manyscenarios.

 

Quick Read (beta)

loading the full paper ...