Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

Abstract

Recent advances in large language models, especially in natural languageunderstanding and reasoning, have opened new possibilities for text-to-motiongeneration. Although existing approaches have made notable progress in semanticalignment and motion synthesis, they often rely on end-to-end mappingstrategies that fail to capture deep linguistic structures and logicalreasoning. Consequently, generated motions tend to lack controllability,consistency, and diversity. To address these limitations, we propose Motion-R1,a unified motion-language modeling framework that integrates a Chain-of-Thoughtmechanism. By explicitly decomposing complex textual instructions intologically structured action paths, Motion-R1 provides high-level semanticguidance for motion generation, significantly enhancing the model's ability tointerpret and execute multi-step, long-horizon, and compositionally richcommands. To train our model, we adopt Group Relative Policy Optimization, areinforcement learning algorithm designed for large models, which leveragesmotion quality feedback to optimize reasoning chains and motion synthesisjointly. Extensive experiments across multiple benchmark datasets demonstratethat Motion-R1 achieves competitive or superior performance compared tostate-of-the-art methods, particularly in scenarios requiring nuanced semanticunderstanding and long-term temporal coherence. The code, model and data willbe publicly available.

Quick Read (beta)

loading the full paper ...