Abstract
While language identification is a fundamental speech and language processingtask, for many languages and language families it remains a challenging task.For many low-resource and endangered languages this is in part due to resourceavailability: where larger datasets exist, they may be single-speaker or havedifferent domains than desired application scenarios, demanding a need fordomain and speaker-invariant language identification systems. This year'sshared task on robust spoken language identification sought to investigate justthis scenario: systems were to be trained on largely single-speaker speech fromone domain, but evaluated on data in other domains recorded from speakers underdifferent recording circumstances, mimicking realistic low-resource scenarios.We see that domain and speaker mismatch proves very challenging for currentmethods which can perform above 95% accuracy in-domain, which domain adaptationcan address to some degree, but that these conditions merit furtherinvestigation to make spoken language identification accessible in manyscenarios.