Deep Reasoning Translation via Reinforcement Learning

Abstract

Recently, deep reasoning LLMs (e.g., OpenAI o1/o3 and DeepSeek-R1) have shownpromising performance in various complex tasks. Free translation is animportant and interesting task in the multilingual world, which requires goingbeyond word-for-word translation and taking cultural differences into account.This task is still under-explored in deep reasoning LLMs. In this paper, weintroduce DeepTrans, a deep reasoning translation model that learns freetranslation via reinforcement learning. Specifically, we carefully build areward model with pre-defined scoring criteria on both the translation resultsand the thought process. Given the source sentences, the reward model teachesthe deep translation model how to think and free-translate them duringreinforcement learning. In this way, training DeepTrans does not need anylabeled translations, avoiding the human-intensive annotation orresource-intensive data synthesis. Experimental results show the effectivenessof DeepTrans. Using Qwen2.5-7B as the backbone, DeepTrans improves performanceby 16.3% in literature translation, and outperforms strong deep reasoningbaselines as well as baselines that are fine-tuned with synthesized data.Moreover, we summarize the failures and interesting findings during our RLexploration. We hope this work could inspire other researchers in freetranslation.

Quick Read (beta)

loading the full paper ...