ContentV: Efficient Training of Video Generation Models with Limited Compute

  • 2025-06-05 18:59:54
  • Wenfeng Lin, Renjie Chen, Boyuan Liu, Shiyue Yan, Ruoyu Feng, Jiangchuan Wei, Yichen Zhang, Yimeng Zhou, Chao Feng, Jiao Ran, Qi Wu, Zuotao Liu, Mingyu Guo
  • 0

Abstract

Recent advances in video generation demand increasingly efficient trainingrecipes to mitigate escalating computational costs. In this report, we presentContentV, an 8B-parameter text-to-video model that achieves state-of-the-artperformance (85.14 on VBench) after training on 256 x 64GB Neural ProcessingUnits (NPUs) for merely four weeks. ContentV generates diverse, high-qualityvideos across multiple resolutions and durations from text prompts, enabled bythree key innovations: (1) A minimalist architecture that maximizes reuse ofpre-trained image generation models for video generation; (2) A systematicmulti-stage training strategy leveraging flow matching for enhanced efficiency;and (3) A cost-effective reinforcement learning with human feedback frameworkthat improves generation quality without requiring additional humanannotations. All the code and models are available at:https://contentv.github.io.

 

Quick Read (beta)

loading the full paper ...