MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

  • 2025-08-20 14:50:55
  • Qian Wang, Ziqi Huang, Ruoxi Jia, Paul Debevec, Ning Yu
  • 0

Abstract

Despite recent advances, long-sequence video generation frameworks stillsuffer from significant limitations: poor assistive capability, suboptimalvisual quality, and limited expressiveness. To mitigate these limitations, wepropose MAViS, an end-to-end multi-agent collaborative framework forlong-sequence video storytelling. MAViS orchestrates specialized agents acrossmultiple stages, including script writing, shot designing, character modeling,keyframe generation, video animation, and audio generation. In each stage,agents operate under the 3E Principle -- Explore, Examine, and Enhance -- toensure the completeness of intermediate outputs. Considering the capabilitylimitations of current generative models, we propose the Script WritingGuidelines to optimize compatibility between scripts and generative tools.Experimental results demonstrate that MAViS achieves state-of-the-artperformance in assistive capability, visual quality, and video expressiveness.Its modular framework further enables scalability with diverse generativemodels and tools. With just a brief user prompt, MAViS is capable of producinghigh-quality, expressive long-sequence video storytelling, enrichinginspirations and creativity for users. To the best of our knowledge, MAViS isthe only framework that provides multimodal design output -- videos withnarratives and background music.

 

Quick Read (beta)

loading the full paper ...