Improving Video Generation with Human Feedback

  • 2025-01-23 18:55:41
  • Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang
  • 0

Abstract

Video generation has achieved significant advances through rectified flowtechniques, but issues like unsmooth motion and misalignment between videos andprompts persist. In this work, we develop a systematic pipeline that harnesseshuman feedback to mitigate these problems and refine the video generationmodel. Specifically, we begin by constructing a large-scale human preferencedataset focused on modern video generation models, incorporating pairwiseannotations across multi-dimensions. We then introduce VideoReward, amulti-dimensional video reward model, and examine how annotations and variousdesign choices impact its rewarding efficacy. From a unified reinforcementlearning perspective aimed at maximizing reward with KL regularization, weintroduce three alignment algorithms for flow-based models by extending thosefrom diffusion models. These include two training-time strategies: directpreference optimization for flow (Flow-DPO) and reward weighted regression forflow (Flow-RWR), and an inference-time technique, Flow-NRG, which appliesreward guidance directly to noisy videos. Experimental results indicate thatVideoReward significantly outperforms existing reward models, and Flow-DPOdemonstrates superior performance compared to both Flow-RWR and standardsupervised fine-tuning methods. Additionally, Flow-NRG lets users assign customweights to multiple objectives during inference, meeting personalized videoquality needs. Project page: https://gongyeliu.github.io/videoalign.

 

Quick Read (beta)

loading the full paper ...