Bayesian Deep Learning and a Probabilistic Perspective of Generalization

Abstract

The key distinguishing property of a Bayesian approach is marginalization,rather than using a single setting of weights. Bayesian marginalization canparticularly improve the accuracy and calibration of modern deep neuralnetworks, which are typically underspecified by the data, and can representmany compelling but different solutions. We show that deep ensembles provide aneffective mechanism for approximate Bayesian marginalization, and propose arelated approach that further improves the predictive distribution bymarginalizing within basins of attraction, without significant overhead. Wealso investigate the prior over functions implied by a vague distribution overneural network weights, explaining the generalization properties of such modelsfrom a probabilistic perspective. From this perspective, we explain resultsthat have been presented as mysterious and distinct to neural networkgeneralization, such as the ability to fit images with random labels, and showthat these results can be reproduced with Gaussian processes. Finally, weprovide a Bayesian perspective on tempering for calibrating predictivedistributions.

Quick Read (beta)

loading the full paper ...