SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling

  • 2025-08-15 15:55:37
  • Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Xiaojiang Zhang, Minglei Zhang, Jiarong Zhang, Wenhao Zhuang, Yuchen Cao, Wankang Bao, Haimo Li, Zheng Lin, Huiming Wang, Haoyang Huang, Zongxian Feng, Zizheng Zhan, Ken Deng, Wen Xiang, Huaixi Tang, Kun Wu, Mengtong Li, Mengfei Xie, Junyi Peng, Haotian Zhang, Bin Chen, Bing Yu
  • 0

Abstract

We introduce SeamlessFlow, a server based reinforcement learning (RL)framework that addresses two core challenges in industrial scale RL: (1)decoupling RL training from the complex execution flow of agents; (2)maximizing GPU utilization with minimal idle time while preserving thestability and scalability required for large-scale deployments. First,SeamlessFlow introduces a data plane that decouples the RL trainer fromdiverse, complex agent implementations while sustaining high throughput. Acentral trajectory manager maintains complete interaction histories andsupports partial rollout, allowing rollout to pause for weight updates andresume seamlessly, keeping agents unaware of service interruptions. Second, wepropose a tag driven scheduling paradigm that abstracts hardware intocapability tagged resources, unifying colocated and disaggregatedarchitectures. Based on this, SeamlessFlow introduces a spatiotemporalmultiplexing pipeline that dynamically reassigns idle training nodes to rolloutin a train rollout separated setup, eliminating pipeline bubbles and fullyexploiting heterogeneous cluster resources. By combining these innovations,SeamlessFlow delivers both stability and high performance, making it wellsuited for multi agent, long horizon, and other complex RL tasks.

 

Quick Read (beta)

loading the full paper ...