Abstract
This paper proposes MotionScript, a motion-to-text conversion algorithm andnatural language representation for human body motions. MotionScript providesmore detailed and accurate descriptions of human body movements compared toprevious natural language methods. Most motion datasets focus on basic,well-defined actions, with limited variation in expression (e.g., sitting,walking, dribbling a ball). But for expressive actions that contain a diversityof movements in the class (e.g. being sad, dancing), or for actions outside thedomain of standard motion capture datasets (e.g. stylistic walking,sign-language, interactions with animals), more specific and granular naturallanguage descriptions are needed. Our proposed MotionScript descriptions differfrom existing natural language representations in that it provides detaileddescriptions in natural language rather than simple action labels orgeneralized captions. To the best of our knowledge, this is the first attemptat translating 3D motions to natural language descriptions without requiringtraining data. Our experiments demonstrate that MotionScript descriptions, whenapplied to text-to-motion tasks, enable large language models to generatecomplex, previously unseen motions. Additional examples, dataset, and code canbe accessed at https://pjyazdian.github.io/MotionScript