Abstract
Diffusion bridges are a promising class of deep-learning methods for samplingfrom unnormalized distributions. Recent works show that the Log Variance (LV)loss consistently outperforms the reverse Kullback-Leibler (rKL) loss whenusing the reparametrization trick to compute rKL-gradients. While the on-policyLV loss yields identical gradients to the rKL loss when combined with thelog-derivative trick for diffusion samplers with non-learnable forwardprocesses, this equivalence does not hold for diffusion bridges or whendiffusion coefficients are learned. Based on this insight we argue that fordiffusion bridges the LV loss does not represent an optimization objective thatcan be motivated like the rKL loss via the data processing inequality. Ouranalysis shows that employing the rKL loss with the log-derivative trick(rKL-LD) does not only avoid these conceptual problems but also consistentlyoutperforms the LV loss. Experimental results with different types of diffusionbridges on challenging benchmarks show that samplers trained with the rKL-LDloss achieve better performance. From a practical perspective we find thatrKL-LD requires significantly less hyperparameter optimization and yields morestable training behavior.