Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

Abstract

Generative modeling aims to transform random noise into structured outputs.In this work, we enhance video diffusion models by allowing motion control viastructured latent noise sampling. This is achieved by just a change in data: wepre-process training videos to yield structured noise. Consequently, our methodis agnostic to diffusion model design, requiring no changes to modelarchitectures or training pipelines. Specifically, we propose a novel noisewarping algorithm, fast enough to run in real time, that replaces randomtemporal Gaussianity with correlated warped noise derived from optical flowfields, while preserving the spatial Gaussianity. The efficiency of ouralgorithm enables us to fine-tune modern video diffusion base models usingwarped noise with minimal overhead, and provide a one-stop solution for a widerange of user-friendly motion control: local object motion control, globalcamera movement control, and motion transfer. The harmonization betweentemporal coherence and spatial Gaussianity in our warped noise leads toeffective motion control while maintaining per-frame pixel quality. Extensiveexperiments and user studies demonstrate the advantages of our method, makingit a robust and scalable approach for controlling motion in video diffusionmodels. Video results are available on our webpage:https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow/; sourcecode and model checkpoints are available on GitHub:https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

Quick Read (beta)

loading the full paper ...