Differentiable Annealed Importance Sampling and the Perils of Gradient Noise

  • 2021-07-21 17:10:14
  • Guodong Zhang, Kyle Hsu, Jianing Li, Chelsea Finn, Roger Grosse
  • 3


Annealed importance sampling (AIS) and related algorithms are highlyeffective tools for marginal likelihood estimation, but are not fullydifferentiable due to the use of Metropolis-Hastings (MH) correction steps.Differentiability is a desirable property as it would admit the possibility ofoptimizing marginal likelihood as an objective using gradient-based methods. Tothis end, we propose a differentiable AIS algorithm by abandoning MH steps,which further unlocks mini-batch computation. We provide a detailed convergenceanalysis for Bayesian linear regression which goes beyond previous analyses byexplicitly accounting for non-perfect transitions. Using this analysis, weprove that our algorithm is consistent in the full-batch setting and provide asublinear convergence rate. However, we show that the algorithm is inconsistentwhen mini-batch gradients are used due to a fundamental incompatibility betweenthe goals of last-iterate convergence to the posterior and elimination of thepathwise stochastic error. This result is in stark contrast to our experiencewith stochastic optimization and stochastic gradient Langevin dynamics, wherethe effects of gradient noise can be washed out by taking more steps of asmaller size. Our negative result relies crucially on our explicitconsideration of convergence to the stationary distribution, and it helpsexplain the difficulty of developing practically effective AIS-like algorithmsthat exploit mini-batch gradients.


Quick Read (beta)

loading the full paper ...