Abstract
During the past five years the Bayesian deep learning community has developedincreasingly accurate and efficient approximate inference procedures that allowfor Bayesian inference in deep neural networks. However, despite thisalgorithmic progress and the promise of improved uncertainty quantification andsample efficiency there are---as of early 2020---no publicized deployments ofBayesian neural networks in industrial practice. In this work we cast doubt onthe current understanding of Bayes posteriors in popular deep neural networks:we demonstrate through careful MCMC sampling that the posterior predictiveinduced by the Bayes posterior yields systematically worse predictions comparedto simpler methods including point estimates obtained from SGD. Furthermore, wedemonstrate that predictive performance is improved significantly through theuse of a "cold posterior" that overcounts evidence. Such cold posteriorssharply deviate from the Bayesian paradigm but are commonly used as heuristicin Bayesian deep learning papers. We put forward several hypotheses that couldexplain cold posteriors and evaluate the hypotheses through experiments. Ourwork questions the goal of accurate posterior approximations in Bayesian deeplearning: If the true Bayes posterior is poor, what is the use of more accurateapproximations? Instead, we argue that it is timely to focus on understandingthe origin of the improved performance of cold posteriors.