GR-3 Technical Report

  • 2025-07-22 15:04:37
  • Chilam Cheang, Sijin Chen, Zhongren Cui, Yingdong Hu, Liqun Huang, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Xiao Ma, Hao Niu, Wenxuan Ou, Wanli Peng, Zeyu Ren, Haixin Shi, Jiawen Tian, Hongtao Wu, Xin Xiao, Yuyang Xiao, Jiafeng Xu, Yichu Yang
  • 0

Abstract

We report our recent progress towards building generalist robot policies, thedevelopment of GR-3. GR-3 is a large-scale vision-language-action (VLA) model.It showcases exceptional capabilities in generalizing to novel objects,environments, and instructions involving abstract concepts. Furthermore, it canbe efficiently fine-tuned with minimal human trajectory data, enabling rapidand cost-effective adaptation to new settings. GR-3 also excels in handlinglong-horizon and dexterous tasks, including those requiring bi-manualmanipulation and mobile movement, showcasing robust and reliable performance.These capabilities are achieved through a multi-faceted training recipe thatincludes co-training with web-scale vision-language data, efficient fine-tuningfrom human trajectory data collected via VR devices, and effective imitationlearning with robot trajectory data. In addition, we introduce ByteMini, aversatile bi-manual mobile robot designed with exceptional flexibility andreliability, capable of accomplishing a wide range of tasks when integratedwith GR-3. Through extensive real-world experiments, we show GR-3 surpasses thestate-of-the-art baseline method, $\pi_0$, on a wide variety of challengingtasks. We hope GR-3 can serve as a step towards building generalist robotscapable of assisting humans in daily life.

 

Quick Read (beta)

loading the full paper ...