Abstract
Language identification greatly impacts the success of downstream tasks suchas automatic speech recognition. Recently, self-supervised speechrepresentations learned by wav2vec 2.0 have been shown to be very effective fora range of speech tasks. We extend previous self-supervised work on languageidentification by experimenting with pre-trained models which were learned onreal-world unconstrained speech in multiple languages and not just on English.We show that models pre-trained on many languages perform better and enablelanguage identification systems that require very little labeled data toperform well. Results on a 26 languages setup show that with only 10 minutes oflabeled data per language, a cross-lingually pre-trained model can achieve over89.2% accuracy.