Abstract
Video generation has achieved significant advances through rectified flowtechniques, but issues like unsmooth motion and misalignment between videos andprompts persist. In this work, we develop a systematic pipeline that harnesseshuman feedback to mitigate these problems and refine the video generationmodel. Specifically, we begin by constructing a large-scale human preferencedataset focused on modern video generation models, incorporating pairwiseannotations across multi-dimensions. We then introduce VideoReward, amulti-dimensional video reward model, and examine how annotations and variousdesign choices impact its rewarding efficacy. From a unified reinforcementlearning perspective aimed at maximizing reward with KL regularization, weintroduce three alignment algorithms for flow-based models by extending thosefrom diffusion models. These include two training-time strategies: directpreference optimization for flow (Flow-DPO) and reward weighted regression forflow (Flow-RWR), and an inference-time technique, Flow-NRG, which appliesreward guidance directly to noisy videos. Experimental results indicate thatVideoReward significantly outperforms existing reward models, and Flow-DPOdemonstrates superior performance compared to both Flow-RWR and standardsupervised fine-tuning methods. Additionally, Flow-NRG lets users assign customweights to multiple objectives during inference, meeting personalized videoquality needs. Project page: https://gongyeliu.github.io/videoalign.