Abstract
Consistency models have recently been introduced to accelerate sampling fromdiffusion models by directly predicting the solution (i.e., data) of theprobability flow ODE (PF ODE) from initial noise. However, the training ofconsistency models requires learning to map all intermediate points along PFODE trajectories to their corresponding endpoints. This task is much morechallenging than the ultimate objective of one-step generation, which onlyconcerns the PF ODE's noise-to-data mapping. We empirically find that thistraining paradigm limits the one-step generation performance of consistencymodels. To address this issue, we generalize consistency training to thetruncated time range, which allows the model to ignore denoising tasks atearlier time steps and focus its capacity on generation. We propose a newparameterization of the consistency function and a two-stage training procedurethat prevents the truncated-time training from collapsing to a trivialsolution. Experiments on CIFAR-10 and ImageNet $64\times64$ datasets show thatour method achieves better one-step and two-step FIDs than the state-of-the-artconsistency models such as iCT-deep, using more than 2$\times$ smallernetworks. Project page: https://truncated-cm.github.io/