Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes

Abstract

Reasoning language models (RLMs) excel at complex tasks by leveraging achain-of-thought process to generate structured intermediate steps. However,language mixing, i.e., reasoning steps containing tokens from languages otherthan the prompt, has been observed in their outputs and shown to affectperformance, though its impact remains debated. We present the first systematicstudy of language mixing in RLMs, examining its patterns, impact, and internalcauses across 15 languages, 7 task difficulty levels, and 18 subject areas, andshow how all three factors influence language mixing. Moreover, we demonstratethat the choice of reasoning language significantly affects performance:forcing models to reason in Latin or Han scripts via constrained decodingnotably improves accuracy. Finally, we show that the script composition ofreasoning traces closely aligns with that of the model's internalrepresentations, indicating that language mixing reflects latent processingpreferences in RLMs. Our findings provide actionable insights for optimizingmultilingual reasoning and open new directions for controlling reasoninglanguages to build more interpretable and adaptable RLMs.

Quick Read (beta)

loading the full paper ...