DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Abstract

While recent image-based human animation methods achieve realistic body andfacial motion synthesis, critical gaps remain in fine-grained holisticcontrollability, multi-scale adaptability, and long-term temporal coherence,which leads to their lower expressiveness and robustness. We propose adiffusion transformer (DiT) based framework, DreamActor-M1, with hybridguidance to overcome these limitations. For motion guidance, our hybrid controlsignals that integrate implicit facial representations, 3D head spheres, and 3Dbody skeletons achieve robust control of facial expressions and body movements,while producing expressive and identity-preserving animations. For scaleadaptation, to handle various body poses and image scales ranging fromportraits to full-body views, we employ a progressive training strategy usingdata with varying resolutions and scales. For appearance guidance, we integratemotion patterns from sequential frames with complementary visual references,ensuring long-term temporal coherence for unseen regions during complexmovements. Experiments demonstrate that our method outperforms thestate-of-the-art works, delivering expressive results for portraits,upper-body, and full-body generation with robust long-term consistency. ProjectPage: https://grisoon.github.io/DreamActor-M1/.

Quick Read (beta)

loading the full paper ...