DiCE: The Infinitely Differentiable Monte-Carlo Estimator

Abstract

The score function estimator is widely used for estimating gradients ofstochastic objectives in Stochastic Computation Graphs (SCG), eg. inreinforcement learning and meta-learning. While deriving the first-ordergradient estimators by differentiating a surrogate loss (SL) objective iscomputationally and conceptually simple, using the same approach forhigher-order gradients is more challenging. Firstly, analytically deriving andimplementing such estimators is laborious and not compliant with automaticdifferentiation. Secondly, repeatedly applying SL to construct new objectivesfor each order gradient involves increasingly cumbersome graph manipulations.Lastly, to match the first-order gradient under differentiation, SL treats partof the cost as a fixed sample, which we show leads to missing and wrong termsfor higher-order gradient estimators. To address all these shortcomings in aunified way, we introduce DiCE, which provides a single objective that can bedifferentiated repeatedly, generating correct gradient estimators of any orderin SCGs. Unlike SL, DiCE relies on automatic differentiation for performing therequisite graph manipulations. We verify the correctness of DiCE both through aproof and through numerical evaluation of the DiCE gradient estimates. We alsouse DiCE to propose and evaluate a novel approach for multi-agent learning. Ourcode is available at https://goo.gl/xkkGxN.

Quick Read (beta)

loading the full paper ...