Abstract
Real-world robotic manipulation in homes and factories demands reliability,efficiency, and robustness that approach or surpass skilled human operators. Wepresent RL-100, a real-world reinforcement learning training framework built ondiffusion visuomotor policies trained by supervised learning. RL-100 introducesa three-stage pipeline. First, imitation learning leverages human priors.Second, iterative offline reinforcement learning uses an Offline PolicyEvaluation procedure, abbreviated OPE, to gate PPO-style updates that areapplied in the denoising process for conservative and reliable improvement.Third, online reinforcement learning eliminates residual failure modes. Anadditional lightweight consistency distillation head compresses the multi-stepsampling process in diffusion into a single-step policy, enablinghigh-frequency control with an order-of-magnitude reduction in latency whilepreserving task performance. The framework is task-, embodiment-, andrepresentation-agnostic and supports both 3D point clouds and 2D RGB inputs, avariety of robot platforms, and both single-step and action-chunk policies. Weevaluate RL-100 on seven real-robot tasks spanning dynamic rigid-body control,such as Push-T and Agile Bowling, fluids and granular pouring, deformable clothfolding, precise dexterous unscrewing, and multi-stage orange juicing. RL-100attains 100\% success across evaluated trials for a total of 900 out of 900episodes, including up to 250 out of 250 consecutive trials on one task. Themethod achieves near-human teleoperation or better time efficiency anddemonstrates multi-hour robustness with uninterrupted operation lasting up totwo hours.