Controllable Longer Image Animation with Diffusion Models

Abstract

Generating realistic animated videos from static images is an important areaof research in computer vision. Methods based on physical simulation and motionprediction have achieved notable advances, but they are often limited tospecific object textures and motion trajectories, failing to exhibit highlycomplex environments and physical dynamics. In this paper, we introduce anopen-domain controllable image animation method using motion priors with videodiffusion models. Our method achieves precise control over the direction andspeed of motion in the movable region by extracting the motion fieldinformation from videos and learning moving trajectories and strengths. Currentpretrained video generation models are typically limited to producing veryshort videos, typically less than 30 frames. In contrast, we propose anefficient long-duration video generation method based on noise reschedulespecifically tailored for image animation tasks, facilitating the creation ofvideos over 100 frames in length while maintaining consistency in contentscenery and motion coordination. Specifically, we decompose the denoise processinto two distinct phases: the shaping of scene contours and the refining ofmotion details. Then we reschedule the noise to control the generated framesequences maintaining long-distance noise correlation. We conducted extensiveexperiments with 10 baselines, encompassing both commercial tools and academicmethodologies, which demonstrate the superiority of our method. Our projectpage: https://wangqiang9.github.io/Controllable.github.io/

Quick Read (beta)

loading the full paper ...