Abstract
Adapting large language models to other languages typically employssupervised fine-tuning (SFT) as a standard approach. However, it often suffersfrom an overemphasis on English performance, a phenomenon that is especiallypronounced in data-constrained environments. To overcome these challenges, wepropose \textbf{Cross-Lingual Optimization (CLO)} that efficiently transfers anEnglish-centric LLM to a target language while preserving its Englishcapabilities. CLO utilizes publicly available English SFT data and atranslation model to enable cross-lingual transfer. We conduct experimentsusing five models on six languages, each possessing varying levels of resource.Our results show that CLO consistently outperforms SFT in both acquiring targetlanguage proficiency and maintaining English performance. Remarkably, inlow-resource languages, CLO with only 3,200 samples surpasses SFT with 6,400samples, demonstrating that CLO can achieve better performance with less data.Furthermore, we find that SFT is particularly sensitive to data quantity inmedium and low-resource languages, whereas CLO remains robust. Ourcomprehensive analysis emphasizes the limitations of SFT and incorporatesadditional training strategies in CLO to enhance efficiency.