Abstract
We propose Diffusion-Sharpening, a fine-tuning approach that enhancesdownstream alignment by optimizing sampling trajectories. Existing RL-basedfine-tuning methods focus on single training timesteps and neglecttrajectory-level alignment, while recent sampling trajectory optimizationmethods incur significant inference NFE costs. Diffusion-Sharpening overcomesthis by using a path integral framework to select optimal trajectories duringtraining, leveraging reward feedback, and amortizing inference costs. Ourmethod demonstrates superior training efficiency with faster convergence, andbest inference efficiency without requiring additional NFEs. Extensiveexperiments show that Diffusion-Sharpening outperforms RL-based fine-tuningmethods (e.g., Diffusion-DPO) and sampling trajectory optimization methods(e.g., Inference Scaling) across diverse metrics including text alignment,compositional capabilities, and human preferences, offering a scalable andefficient solution for future diffusion model fine-tuning. Code:https://github.com/Gen-Verse/Diffusion-Sharpening