Abstract
Autoregressive video diffusion models are capable of long rollouts that arestable and consistent with history, but they are unable to guide the currentgeneration with conditioning from the future. In camera-guided video generationwith a predefined camera trajectory, this limitation leads to collisions withthe generated scene, after which autoregression quickly collapses. To addressthis, we propose Generative View Stitching (GVS), which samples the entiresequence in parallel such that the generated scene is faithful to every part ofthe predefined camera trajectory. Our main contribution is a sampling algorithmthat extends prior work on diffusion stitching for robot planning to videogeneration. While such stitching methods usually require a specially trainedmodel, GVS is compatible with any off-the-shelf video model trained withDiffusion Forcing, a prevalent sequence diffusion framework that we showalready provides the affordances necessary for stitching. We then introduceOmni Guidance, a technique that enhances the temporal consistency in stitchingby conditioning on both the past and future, and that enables our proposedloop-closing mechanism for delivering long-range coherence. Overall, GVSachieves camera-guided video generation that is stable, collision-free,frame-to-frame consistent, and closes loops for a variety of predefined camerapaths, including Oscar Reutersv\"ard's Impossible Staircase. Results are bestviewed as videos at https://andrewsonga.github.io/gvs.