Abstract
Multilingual translation stands as a challenging task for large languagemodels (LLMs) to handle intricate language patterns and stilted translationsthat arise in automated translations. In this paper, we introduce Seed-X, afamily of open-source LLMs comprising instruct and reasoning models, pushingthe limits of translation capability with 7B parameter size. The base model ispre-trained on a diverse, high-quality dataset encompassing both monolingualand bilingual content across 28 languages, harnessing the full potential ofmultilingual data. The instruct model is then finetuned to translate byChain-of-Thought (CoT) reasoning and further enhanced through reinforcementlearning (RL) to achieve better generalization across diverse language pairs.Seed-X achieves performance comparable to leading closed-source models,including Gemini-2.5 and GPT-4o, across 28 languages, and significantlyoutperforms larger open-source models in both automatic metrics and humanevaluations. We share the best practices through our optimization process, andmake the parameter public available for advancing translation research andapplications.