Abstract
Large language models (LLMs) have exhibited impressive multilingual reasoningcapabilities, driven by extensive multilingual pre-training corpora andinstruction fine-tuning data. However, a performance gap exists between high-and low-resource language reasoning tasks due to the language imbalance in thepre-training corpus, which is exacerbated by evaluation bias in existingreasoning benchmarks lacking low-resource language coverage. To alleviate thisissue, we propose LinguaLIFT, a two-stage instruction tuning framework foradvancing low-resource language reasoning. LinguaLIFT employs a languagealignment layer to capture multilingual alignment in a code-switched tuning waywithout requiring multilingual instruction or parallel data, therebytransferring the cross-lingual reasoning capabilities to low-resource languagesthrough English-only instruction tuning data. To comprehensively evaluate themultilingual reasoning capabilities, we introduce the Multilingual Math WorldProblem (MMWP) benchmark, which spans 21 low-resource, 17 medium-resource, and10 high-resource languages. Experimental results show that LinguaLIFToutperforms several competitive baselines across MMWP and four widely usedbenchmarks.