Abstract
Gaussian processes are often considered a gold standard in uncertaintyestimation with low dimensional data, but they have difficulty scaling to highdimensional inputs. Deep Kernel Learning (DKL) was introduced as a solution tothis problem: a deep feature extractor is used to transform the inputs overwhich a Gaussian process' kernel is defined. However, DKL has been shown toprovide unreliable uncertainty estimates in practice. We study why, and showthat for certain feature extractors, "far-away" data points are mapped to thesame features as those of training-set points. With this insight we propose toconstrain DKL's feature extractor to approximately preserve distances through abi-Lipschitz constraint, resulting in a feature space favorable to DKL. Weobtain a model, DUE, which demonstrates uncertainty quality outperformingprevious DKL and single forward pass uncertainty methods, while maintaining thespeed and accuracy of softmax neural networks.