Evaluating Code-Mixing in LLMs Across 18 Languages

Abstract

Code-mixing, the practice of switching between languages within aconversation, presents unique challenges for traditional natural languageprocessing. Existing benchmarks, such as LinCE and GLUECoS, are limited bynarrow language pairings and tasks, failing to adequately evaluate thecode-mixing capabilities of large language models (LLMs). Despite thesignificance of code-mixing for multilingual users, research on LLMs in thiscontext remains limited. Additionally, current methods for generatingcode-mixed data are underdeveloped. In this paper, we conduct a comprehensiveevaluation of LLMs' performance on code-mixed data across 18 languages fromseven language families. We also propose a novel approach for generatingsynthetic code-mixed texts by combining word substitution with GPT-4 prompting.Our analysis reveals consistent underperformance of LLMs on code-mixed datasetsinvolving multiple language families. We suggest that improvements in trainingdata size, model scale, and few-shot learning could enhance their performance.

Quick Read (beta)

loading the full paper ...