Generative Video Propagation

Abstract

Large-scale video generation models have the inherent ability torealistically model natural scenes. In this paper, we demonstrate that througha careful design of a generative video propagation framework, various videotasks can be addressed in a unified way by leveraging the generative power ofsuch models. Specifically, our framework, GenProp, encodes the original videowith a selective content encoder and propagates the changes made to the firstframe using an image-to-video generation model. We propose a data generationscheme to cover multiple video tasks based on instance-level video segmentationdatasets. Our model is trained by incorporating a mask prediction decoder headand optimizing a region-aware loss to aid the encoder to preserve the originalcontent while the generation model propagates the modified region. This noveldesign opens up new possibilities: In editing scenarios, GenProp allowssubstantial changes to an object's shape; for insertion, the inserted objectscan exhibit independent motion; for removal, GenProp effectively removeseffects like shadows and reflections from the whole video; for tracking,GenProp is capable of tracking objects and their associated effects together.Experiment results demonstrate the leading performance of our model in variousvideo tasks, and we further provide in-depth analyses of the proposedframework.

Quick Read (beta)

loading the full paper ...