The Impact of Language Mixing on Bilingual LLM Reasoning

Abstract

Proficient multilingual speakers often intentionally switch languages in themiddle of a conversation. Similarly, recent reasoning-focused bilingual largelanguage models (LLMs) with strong capabilities in both languages exhibitlanguage mixing--alternating languages within their chain of thought.Discouraging this behavior in DeepSeek-R1 was found to degrade accuracy,suggesting that language mixing may benefit reasoning. In this work, we studylanguage switching in Chinese-English bilingual reasoning models. We identifyreinforcement learning with verifiable rewards (RLVR) as the critical trainingstage that leads to language mixing. We demonstrate that language mixing canenhance reasoning: enforcing monolingual decoding reduces accuracy by 5.6percentage points on math reasoning tasks. Additionally, a lightweight probecan be trained to predict whether a potential language switch would benefit orharm reasoning, and when used to guide decoding, increases accuracy by up to6.25 percentage points. Our findings suggest that language mixing is not merelya byproduct of multilingual training, but is a strategic reasoning behavior.

Quick Read (beta)

loading the full paper ...