Improving Video Generation with Human Feedback

Abstract

Video generation has achieved significant advances through rectified flowtechniques, but issues like unsmooth motion and misalignment between videos andprompts persist. In this work, we develop a systematic pipeline that harnesseshuman feedback to mitigate these problems and refine the video generationmodel. Specifically, we begin by constructing a large-scale human preferencedataset focused on modern video generation models, incorporating pairwiseannotations across multi-dimensions. We then introduce VideoReward, amulti-dimensional video reward model, and examine how annotations and variousdesign choices impact its rewarding efficacy. From a unified reinforcementlearning perspective aimed at maximizing reward with KL regularization, weintroduce three alignment algorithms for flow-based models by extending thosefrom diffusion models. These include two training-time strategies: directpreference optimization for flow (Flow-DPO) and reward weighted regression forflow (Flow-RWR), and an inference-time technique, Flow-NRG, which appliesreward guidance directly to noisy videos. Experimental results indicate thatVideoReward significantly outperforms existing reward models, and Flow-DPOdemonstrates superior performance compared to both Flow-RWR and standardsupervised fine-tuning methods. Additionally, Flow-NRG lets users assign customweights to multiple objectives during inference, meeting personalized videoquality needs. Project page: https://gongyeliu.github.io/videoalign.

Quick Read (beta)

loading the full paper ...