Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Abstract

We present Ring-1T, the first open-source, state-of-the-art thinking modelwith a trillion-scale parameter. It features 1 trillion total parameters andactivates approximately 50 billion per token. Training such models at atrillion-parameter scale introduces unprecedented challenges, includingtrain-inference misalignment, inefficiencies in rollout processing, andbottlenecks in the RL system. To address these, we pioneer three interconnectedinnovations: (1) IcePop stabilizes RL training via token-level discrepancymasking and clipping, resolving instability from training-inference mismatches;(2) C3PO++ improves resource utilization for long rollouts under a token budgetby dynamically partitioning them, thereby obtaining high time efficiency; and(3) ASystem, a high-performance RL framework designed to overcome the systemicbottlenecks that impede trillion-parameter model training. Ring-1T deliversbreakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 onHMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-v1. Notably, it attains asilver medal-level result on the IMO-2025, underscoring its exceptionalreasoning capabilities. By releasing the complete 1T parameter MoE model to thecommunity, we provide the research community with direct access to cutting-edgereasoning capabilities. This contribution marks a significant milestone indemocratizing large-scale reasoning intelligence and establishes a new baselinefor open-source model performance.

Quick Read (beta)

loading the full paper ...