Non-native Speaker Verification for Spoken Language Assessment

Abstract

Automatic spoken language assessment systems are becoming more popular inorder to handle increasing interests in second language learning. One challengefor these systems is to detect malpractice. Malpractice can take a range offorms, this paper focuses on detecting when a candidate attempts to impersonateanother in a speaking test. This form of malpractice is closely related tospeaker verification, but applied in the specific domain of spoken languageassessment. Advanced speaker verification systems, which leverage deep-learningapproaches to extract speaker representations, have been successfully appliedto a range of native speaker verification tasks. These systems are explored fornon-native spoken English data in this paper. The data used for speakerenrolment and verification is mainly taken from the BULATS test, which assessesEnglish language skills for business. Performance of systems trained onrelatively limited amounts of BULATS data, and standard large speakerverification corpora, is compared. Experimental results on large-scale testsets with millions of trials show that the best performance is achieved byadapting the imported model to non-native data. Breakdown of impostor trialsacross different first languages (L1s) and grades is analysed, which shows thatinter-L1 impostors are more challenging for speaker verification systems.

Quick Read (beta)

loading the full paper ...