Abstract
Inference scaling empowers LLMs with unprecedented reasoning ability, withreinforcement learning as the core technique to elicit complex reasoning.However, key technical details of state-of-the-art reasoning LLMs are concealed(such as in OpenAI o1 blog and DeepSeek R1 technical report), thus thecommunity still struggles to reproduce their RL training results. We proposethe $\textbf{D}$ecoupled Clip and $\textbf{D}$ynamic s$\textbf{A}$mpling$\textbf{P}$olicy $\textbf{O}$ptimization ($\textbf{DAPO}$) algorithm, andfully open-source a state-of-the-art large-scale RL system that achieves 50points on AIME 2024 using Qwen2.5-32B base model. Unlike previous works thatwithhold training details, we introduce four key techniques of our algorithmthat make large-scale LLM RL a success. In addition, we open-source ourtraining code, which is built on the verl framework, along with a carefullycurated and processed dataset. These components of our open-source systemenhance reproducibility and support future research in large-scale LLM RL.