Abstract
Diffusion models have achieved remarkable success in generative modeling.Despite more stable training, the loss of diffusion models is not indicative ofabsolute data-fitting quality, since its optimal value is typically not zerobut unknown, leading to confusion between large optimal loss and insufficientmodel capacity. In this work, we advocate the need to estimate the optimal lossvalue for diagnosing and improving diffusion models. We first derive theoptimal loss in closed form under a unified formulation of diffusion models,and develop effective estimators for it, including a stochastic variantscalable to large datasets with proper control of variance and bias. With thistool, we unlock the inherent metric for diagnosing the training quality ofmainstream diffusion model variants, and develop a more performant trainingschedule based on the optimal loss. Moreover, using models with 120M to 1.5Bparameters, we find that the power law is better demonstrated after subtractingthe optimal loss from the actual training loss, suggesting a more principledsetting for investigating the scaling law for diffusion models.