SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

  • 2024-10-31 19:03:51
  • Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Linjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang
  • 0

Abstract

Human beings are endowed with a complementary learning system, which bridgesthe slow learning of general world dynamics with fast storage of episodicmemory from a new experience. Previous video generation models, however,primarily focus on slow learning by pre-training on vast amounts of data,overlooking the fast learning phase crucial for episodic memory storage. Thisoversight leads to inconsistencies across temporally distant frames whengenerating longer videos, as these frames fall beyond the model's contextwindow. To this end, we introduce SlowFast-VGen, a novel dual-speed learningsystem for action-driven long video generation. Our approach incorporates amasked conditional video diffusion model for the slow learning of worlddynamics, alongside an inference-time fast learning strategy based on atemporal LoRA module. Specifically, the fast learning process updates itstemporal LoRA parameters based on local inputs and outputs, thereby efficientlystoring episodic memory in its parameters. We further propose a slow-fastlearning loop algorithm that seamlessly integrates the inner fast learning loopinto the outer slow learning loop, enabling the recall of prior multi-episodeexperiences for context-aware skill learning. To facilitate the slow learningof an approximate world model, we collect a large-scale dataset of 200k videoswith language action annotations, covering a wide range of scenarios. Extensiveexperiments show that SlowFast-VGen outperforms baselines across variousmetrics for action-driven video generation, achieving an FVD score of 514compared to 782, and maintaining consistency in longer videos, with an averageof 0.37 scene cuts versus 0.89. The slow-fast learning loop algorithmsignificantly enhances performances on long-horizon planning tasks as well.Project Website: https://slowfast-vgen.github.io

 

Quick Read (beta)

loading the full paper ...