Evaluating Uncertainty in Deep Gaussian Processes

Abstract

Reliable uncertainty estimates are crucial in modern machine learning. DeepGaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPshierarchically, offering promising methods for uncertainty quantificationgrounded in Bayesian principles. However, their empirical calibration androbustness under distribution shift relative to baselines like Deep Ensemblesremain understudied. This work evaluates these models on regression (CASPdataset) and classification (ESR dataset) tasks, assessing predictiveperformance (MAE, Accu- racy), calibration using Negative Log-Likelihood (NLL)and Expected Calibration Error (ECE), alongside robustness under varioussynthetic feature-level distribution shifts. Results indicate DSPPs providestrong in-distribution calibration leveraging their sigma point approximations.However, compared to Deep Ensembles, which demonstrated superior robustness inboth per- formance and calibration under the tested shifts, the GP-basedmethods showed vulnerabilities, exhibiting particular sensitivity in theobserved metrics. Our findings underscore ensembles as a robust baseline,suggesting that while deep GP methods offer good in-distribution calibration,their practical robustness under distribution shift requires carefulevaluation. To facilitate reproducibility, we make our code available athttps://github.com/matthjs/xai-gp.

Quick Read (beta)

loading the full paper ...