RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning

  • 2025-10-22 13:03:47
  • Jinrui Liu, Bingyan Nie, Boyu Li, Yaran Chen, Yuze Wang, Shunsen He, Haoran Li
  • 0

Abstract

Improving the reasoning capabilities of embodied agents is crucial for robotsto complete complex human instructions in long-view manipulation taskssuccessfully. Despite the success of large language models and vision languagemodels based on Supervised Fine-Tuning (SFT) in planning tasks, they continuefacing challenges in performing long-horizon manipulation tasks in complexreal-world environments, owing to their restricted common sense and reasoningcapabilities. Considering that aligning general-purpose vision language modelsto robotic planning tasks via supervised fine-tuning suffers from poorgeneralization and insufficient physical understanding, we propose RoboGPT-R1,a two-stage fine-tuning framework for embodied planning. In this framework,supervised training acquires foundational knowledge through expert sequences,followed by RL to address the model's shortcomings in visual-spatialunderstanding and reasoning. To achieve physical understanding and actionsequence consistency in multi-step reasoning tasks, we design a rule-basedreward function that simultaneously considers long-horizon performance andaction constraint in the environment. The reasoning model, trained onQwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini,by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on theEmbodiedBench benchmark.

 

Quick Read (beta)

loading the full paper ...