Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Abstract

This technical report presents a cost-efficient strategy for training a videogeneration foundation model. We present a mid-sized research model withapproximately 7 billion parameters (7B) called Seaweed-7B trained from scratchusing 665,000 H100 GPU hours. Despite being trained with moderate computationalresources, Seaweed-7B demonstrates highly competitive performance compared tocontemporary video generation models of much larger size. Design choices areespecially crucial in a resource-constrained setting. This technical reporthighlights the key design decisions that enhance the performance of themedium-sized diffusion model. Empirically, we make two observations: (1)Seaweed-7B achieves performance comparable to, or even surpasses, larger modelstrained on substantially greater GPU resources, and (2) our model, whichexhibits strong generalization ability, can be effectively adapted across awide range of downstream applications either by lightweight fine-tuning orcontinue training. See the project page at https://seaweed.video/

Quick Read (beta)

loading the full paper ...