Information Newton's flow: second-order optimization method in probability space

Abstract

We introduce a framework for Newton's flows in probability space withinformation metrics, named information Newton's flows. Here two informationmetrics are considered, including both the Fisher-Rao metric and theWasserstein-2 metric. Several examples of information Newton's flows forlearning objective/loss functions are provided, such as Kullback-Leibler (KL)divergence, Maximum mean discrepancy (MMD), and cross entropy. The asymptoticconvergence results of proposed Newton's methods are provided. A known fact isthat overdamped Langevin dynamics correspond to Wasserstein gradient flows ofKL divergence. Extending this fact to Wasserstein Newton's flows of KLdivergence, we derive Newton's Langevin dynamics. We provide examples ofNewton's Langevin dynamics in both one-dimensional space and Gaussian families.For the numerical implementation, we design sampling efficient variationalmethods to approximate Wasserstein Newton's directions. Several numericalexamples in Gaussian families and Bayesian logistic regression are shown todemonstrate the effectiveness of the proposed method.

Quick Read (beta)

loading the full paper ...