MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Abstract

Human motion generation and editing are key components of computer graphicsand vision. However, current approaches in this field tend to offer isolatedsolutions tailored to specific tasks, which can be inefficient and impracticalfor real-world applications. While some efforts have aimed to unifymotion-related tasks, these methods simply use different modalities asconditions to guide motion generation. Consequently, they lack editingcapabilities, fine-grained control, and fail to facilitate knowledge sharingacross tasks. To address these limitations and provide a versatile, unifiedframework capable of handling both human motion generation and editing, weintroduce a novel paradigm: Motion-Condition-Motion, which enables the unifiedformulation of diverse tasks with three concepts: source motion, condition, andtarget motion. Based on this paradigm, we propose a unified framework,MotionLab, which incorporates rectified flows to learn the mapping from sourcemotion to target motion, guided by the specified conditions. In MotionLab, weintroduce the 1) MotionFlow Transformer to enhance conditional generation andediting without task-specific modules; 2) Aligned Rotational Position Encoding}to guarantee the time synchronization between source motion and target motion;3) Task Specified Instruction Modulation; and 4) Motion Curriculum Learning foreffective multi-task learning and knowledge sharing across tasks. Notably, ourMotionLab demonstrates promising generalization capabilities and inferenceefficiency across multiple benchmarks for human motion. Our code and additionalvideo results are available at: https://diouo.github.io/motionlab.github.io/.

Quick Read (beta)

loading the full paper ...