Abstract
We study Variational Rectified Flow Matching, a framework that enhancesclassic rectified flow matching by modeling multi-modal velocity vector-fields.At inference time, classic rectified flow matching 'moves' samples from asource distribution to the target distribution by solving an ordinarydifferential equation via integration along a velocity vector-field. Attraining time, the velocity vector-field is learnt by linearly interpolatingbetween coupled samples one drawn from the source and one drawn from the targetdistribution randomly. This leads to ''ground-truth'' velocity vector-fieldsthat point in different directions at the same location, i.e., the velocityvector-fields are multi-modal/ambiguous. However, since training uses astandard mean-squared-error loss, the learnt velocity vector-field averages''ground-truth'' directions and isn't multi-modal. In contrast, variationalrectified flow matching learns and samples from multi-modal flow directions. Weshow on synthetic data, MNIST, CIFAR-10, and ImageNet that variationalrectified flow matching leads to compelling results.