Abstract
Diffusion models approximate the denoising distribution as a Gaussian andpredict its mean, whereas flow matching models reparameterize the Gaussian meanas flow velocity. However, they underperform in few-step sampling due todiscretization error and tend to produce over-saturated colors underclassifier-free guidance (CFG). To address these limitations, we propose anovel Gaussian mixture flow matching (GMFlow) model: instead of predicting themean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture amulti-modal flow velocity distribution, which can be learned with a KLdivergence loss. We demonstrate that GMFlow generalizes previous diffusion andflow matching models where a single Gaussian is learned with an $L_2$ denoisingloss. For inference, we derive GM-SDE/ODE solvers that leverage analyticdenoising distributions and velocity fields for precise few-step sampling.Furthermore, we introduce a novel probabilistic guidance scheme that mitigatesthe over-saturation issues of CFG and improves image generation quality.Extensive experiments demonstrate that GMFlow consistently outperforms flowmatching baselines in generation quality, achieving a Precision of 0.942 withonly 6 sampling steps on ImageNet 256$\times$256.