Abstract
The posterior over Bayesian neural network (BNN) parameters is extremelyhigh-dimensional and non-convex. For computational reasons, researchersapproximate this posterior using inexpensive mini-batch methods such asmean-field variational inference or stochastic-gradient Markov chain MonteCarlo (SGMCMC). To investigate foundational questions in Bayesian deeplearning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modernarchitectures. We show that (1) BNNs can achieve significant performance gainsover standard training and deep ensembles; (2) a single long HMC chain canprovide a comparable representation of the posterior to multiple shorterchains; (3) in contrast to recent studies, we find posterior tempering is notneeded for near-optimal performance, with little evidence for a "coldposterior" effect, which we show is largely an artifact of data augmentation;(4) BMA performance is robust to the choice of prior scale, and relativelysimilar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5)Bayesian neural networks show surprisingly poor generalization under domainshift; (6) while cheaper alternatives such as deep ensembles and SGMCMC methodscan provide good generalization, they provide distinct predictive distributionsfrom HMC. Notably, deep ensemble predictive distributions are similarly closeto HMC as standard SGLD, and closer than standard variational inference.