Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

Abstract

Posterior predictive distributions quantify uncertainties ignored by pointestimates. This paper introduces \textit{The Neural Testbed}, which providestools for the systematic evaluation of agents that generate such predictions.Crucially, these tools assess not only the quality of marginal predictions perinput, but also joint predictions given many inputs. Joint distributions areoften critical for useful uncertainty quantification, but they have beenlargely overlooked by the Bayesian deep learning community. We benchmarkseveral approaches to uncertainty estimation using a neural-network-based datagenerating process. Our results reveal the importance of evaluation beyondmarginal predictions. Further, they reconcile sources of confusion in thefield, such as why Bayesian deep learning approaches that generate accuratemarginal predictions perform poorly in sequential decision tasks, howincorporating priors can be helpful, and what roles epistemic versus aleatoricuncertainty play when evaluating performance. We also present experiments onreal-world challenge datasets, which show a high correlation with testbedresults, and that the importance of evaluating joint predictive distributionscarries over to real data. As part of this effort, we opensource The NeuralTestbed, including all implementations from this paper.

Quick Read (beta)

loading the full paper ...