Abstract
This paper addresses the challenge of integrating low-resource languages intomultilingual automatic speech recognition (ASR) systems. We introduce a novelapplication of weighted cross-entropy, typically used for unbalanced datasets,to facilitate the integration of low-resource languages into pre-trainedmultilingual ASR models within the context of continual multilingual learning.We fine-tune the Whisper multilingual ASR model on five high-resource languagesand one low-resource language, employing language-weighted dynamiccross-entropy and data augmentation. The results show a remarkable 6.69% worderror rate (WER) reduction for the low-resource language compared to thefine-tuned model without applying our approach, and a 48.86% WER reductioncompared to the original Whisper model. In addition, our approach yields anaverage WER reduction of 3.29% across the six languages, showing no degradationfor the high-resource languages.