Abstract
Large language models (LLMs) have shown impressive multilingual capabilitiesthrough pretraining on diverse corpora. Although these models show strongreasoning abilities, their performance varies significantly between languagesdue to the imbalanced distribution of training data. Existing approaches usingsample-level translation for extensive multilingual pretraining andcross-lingual tuning face scalability challenges and often fail to capturenuanced reasoning processes across languages. In this paper, we introduceAdaMCOT (Adaptive Multilingual Chain-of-Thought), a framework that enhancesmultilingual factual reasoning by dynamically routing thought processes inintermediary "thinking languages" before generating target-language responses.AdaMCOT leverages a language-agnostic core and incorporates an adaptive,reward-based mechanism for selecting optimal reasoning pathways withoutrequiring additional pretraining. Our comprehensive evaluation across multiplebenchmarks demonstrates substantial improvements in both factual reasoningquality and cross-lingual consistency, with particularly strong performancegains in low-resource language settings. An in-depth analysis of the model'shidden states and semantic space further elucidates the underlying mechanism ofour method. The results suggest that adaptive reasoning paths can effectivelybridge the performance gap between high and low-resource languages whilemaintaining cultural and linguistic nuances.