Abstract
Generative models have made significant impacts across various domains,largely due to their ability to scale during training by increasing data,computational resources, and model size, a phenomenon characterized by thescaling laws. Recent research has begun to explore inference-time scalingbehavior in Large Language Models (LLMs), revealing how performance can furtherimprove with additional computation during inference. Unlike LLMs, diffusionmodels inherently possess the flexibility to adjust inference-time computationvia the number of denoising steps, although the performance gains typicallyflatten after a few dozen. In this work, we explore the inference-time scalingbehavior of diffusion models beyond increasing denoising steps and investigatehow the generation performance can further improve with increased computation.Specifically, we consider a search problem aimed at identifying better noisesfor the diffusion sampling process. We structure the design space along twoaxes: the verifiers used to provide feedback, and the algorithms used to findbetter noise candidates. Through extensive experiments on class-conditioned andtext-conditioned image generation benchmarks, our findings reveal thatincreasing inference-time compute leads to substantial improvements in thequality of samples generated by diffusion models, and with the complicatednature of images, combinations of the components in the framework can bespecifically chosen to conform with different application scenario.