Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

Abstract

This report introduces Dolphin, a large-scale multilingual automatic speechrecognition (ASR) model that extends the Whisper architecture to support awider range of languages. Our approach integrates in-house proprietary andopen-source datasets to refine and optimize Dolphin's performance. The model isspecifically designed to achieve notable recognition accuracy for 40 Easternlanguages across East Asia, South Asia, Southeast Asia, and the Middle East,while also supporting 22 Chinese dialects. Experimental evaluations show thatDolphin significantly outperforms current state-of-the-art open-source modelsacross various languages. To promote reproducibility and community-driveninnovation, we are making our trained models and inference source code publiclyavailable.

Quick Read (beta)

loading the full paper ...