Abstract
The effects of language mismatch impact speech anti-spoofing systems, whileinvestigations and quantification of these effects remain limited. Existinganti-spoofing datasets are mainly in English, and the high cost of acquiringmultilingual datasets hinders training language-independent models. We initiatethis work by evaluating top-performing speech anti-spoofing systems that aretrained on English data but tested on other languages, observing notableperformance declines. We propose an innovative approach - Accent-based dataexpansion via TTS (ACCENT), which introduces diverse linguistic knowledge tomonolingual-trained models, improving their cross-lingual capabilities. Weconduct experiments on a large-scale dataset consisting of over 3 millionsamples, including 1.8 million training samples and nearly 1.2 million testingsamples across 12 languages. The language mismatch effects are preliminarilyquantified and remarkably reduced over 15% by applying the proposed ACCENT.This easily implementable method shows promise for multilingual andlow-resource language scenarios.