Abstract
Recently, deep reasoning LLMs (e.g., OpenAI o1/o3 and DeepSeek-R1) have shownpromising performance in various complex tasks. Free translation is animportant and interesting task in the multilingual world, which requires goingbeyond word-for-word translation and taking cultural differences into account.This task is still under-explored in deep reasoning LLMs. In this paper, weintroduce DeepTrans, a deep reasoning translation model that learns freetranslation via reinforcement learning. Specifically, we carefully build areward model with pre-defined scoring criteria on both the translation resultsand the thought process. Given the source sentences, the reward model teachesthe deep translation model how to think and free-translate them duringreinforcement learning. In this way, training DeepTrans does not need anylabeled translations, avoiding the human-intensive annotation orresource-intensive data synthesis. Experimental results show the effectivenessof DeepTrans. Using Qwen2.5-7B as the backbone, DeepTrans improves performanceby 16.3% in literature translation, and outperforms strong deep reasoningbaselines as well as baselines that are fine-tuned with synthesized data.Moreover, we summarize the failures and interesting findings during our RLexploration. We hope this work could inspire other researchers in freetranslation.