VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Abstract

Arabic is a complex language with many varieties and dialects spoken by over450 millions all around the world. Due to the linguistic diversity andvariations, it is challenging to build a robust and generalized ASR system forArabic. In this work, we address this gap by developing and demoing a system,dubbed VoxArabica, for dialect identification (DID) as well as automatic speechrecognition (ASR) of Arabic. We train a wide range of models such as HuBERT(DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASRtasks. Our DID models are trained to identify 17 different dialects in additionto MSA. We finetune our ASR models on MSA, Egyptian, Moroccan, and mixed data.Additionally, for the remaining dialects in ASR, we provide the option tochoose various models such as Whisper and MMS in a zero-shot setting. Weintegrate these models into a single web interface with diverse features suchas audio recording, file upload, model selection, and the option to raise flagsfor incorrect outputs. Overall, we believe VoxArabica will be useful for a widerange of audiences concerned with Arabic research. Our system is currentlyrunning at https://cdce-206-12-100-168.ngrok.io/.

Quick Read (beta)

loading the full paper ...