Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models

Abstract

This paper presents Seewo's systems for both tracks of the MultilingualConversational Speech Language Model Challenge (MLC-SLM), addressing automaticspeech recognition (ASR) and speaker diarization with ASR (SD-ASR). Weintroduce a multi-stage training pipeline that explicitly enhances reasoningand self-correction in speech language models for ASR. Our approach combinescurriculum learning for progressive capability acquisition, Chain-of-Thoughtdata augmentation to foster intermediate reflection, and Reinforcement Learningwith Verifiable Rewards (RLVR) to further refine self-correction throughreward-driven optimization. This approach achieves substantial improvementsover the official challenge baselines. On the evaluation set, our best systemattains a WER/CER of 11.57% for Track 1 and a tcpWER/tcpCER of 17.67% for Track2. Comprehensive ablation studies demonstrate the effectiveness of eachcomponent under challenge constraints.

Quick Read (beta)

loading the full paper ...