Methods of Automatic Matrix Language Determination for Code-Switched Speech

  • 2024-11-14 19:36:43
  • Olga Iakovenko, Thomas Hain
  • 0

Abstract

Code-switching (CS) is the process of speakers interchanging between two ormore languages which in the modern world becomes increasingly common. In orderto better describe CS speech the Matrix Language Frame (MLF) theory introducesthe concept of a Matrix Language, which is the language that provides thegrammatical structure for a CS utterance. In this work the MLF theory was usedto develop systems for Matrix Language Identity (MLID) determination. The MLIDof English/Mandarin and English/Spanish CS text and speech was compared toacoustic language identity (LID), which is a typical way to identify a languagein monolingual utterances. MLID predictors from audio show higher correlationwith the textual principles than LID in all cases while also outperforming LIDin an MLID recognition task based on F1 macro (60%) and correlation score(0.38). This novel approach has identified that non-English languages (Mandarinand Spanish) are preferred over the English language as the ML contrary to themonolingual choice of LID.

 

Quick Read (beta)

loading the full paper ...