Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Abstract

We introduce Videoshop, a training-free video editing algorithm for localizedsemantic edits. Videoshop allows users to use any editing software, includingPhotoshop and generative inpainting, to modify the first frame; itautomatically propagates those changes, with semantic, spatial, and temporallyconsistent motion, to the remaining frames. Unlike existing methods that enableedits only through imprecise textual instructions, Videoshop allows users toadd or remove objects, semantically change objects, insert stock photos intovideos, etc. with fine-grained control over locations and appearance. Weachieve this through image-based video editing by inverting latents with noiseextrapolation, from which we generate videos conditioned on the edited image.Videoshop produces higher quality edits against 6 baselines on 2 editingbenchmarks using 10 evaluation metrics.

Quick Read (beta)

loading the full paper ...