RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation

  • 2025-09-19 13:24:17
  • Chao Yu, Yuanqing Wang, Zhen Guo, Hao Lin, Si Xu, Hongzhi Zang, Quanlu Zhang, Yongji Wu, Chunyang Zhu, Junhao Hu, Zixiao Huang, Mingjie Wei, Yuqing Xie, Ke Yang, Bo Dai, Zhexuan Xu, Xiangyuan Wang, Xu Fu, Zhihao Liu, Kang Chen, Weilin Liu, Gang Liu, Boxun Li, Jianlei Yang, Zhi Yang, Guohao Dai, Yu Wang
  • 0

Abstract

Reinforcement learning (RL) has demonstrated immense potential in advancingartificial general intelligence, agentic intelligence, and embodiedintelligence. However, the inherent heterogeneity and dynamicity of RLworkflows often lead to low hardware utilization and slow training on existingsystems. In this paper, we present RLinf, a high-performance RL training systembased on our key observation that the major roadblock to efficient RL traininglies in system flexibility. To maximize flexibility and efficiency, RLinf isbuilt atop a novel RL system design paradigm called macro-to-micro flowtransformation (M2Flow), which automatically breaks down high-level,easy-to-compose RL workflows at both the temporal and spatial dimensions, andrecomposes them into optimized execution flows. Supported by RLinf worker'sadaptive communication capability, we devise context switching and elasticpipelining to realize M2Flow transformation, and a profiling-guided schedulingpolicy to generate optimal execution plans. Extensive evaluations on bothreasoning RL and embodied RL tasks demonstrate that RLinf consistentlyoutperforms state-of-the-art systems, achieving 1.1x-2.13x speedup inend-to-end training throughput.

 

Quick Read (beta)

loading the full paper ...