On the Iteration Complexity of Hypergradient Computation

Abstract

We study a general class of bilevel problems, consisting in the minimizationof an upper-level objective which depends on the solution to a parametricfixed-point equation. Important instances arising in machine learning includehyperparameter optimization, meta-learning, and certain graph and recurrentneural networks. Typically the gradient of the upper-level objective(hypergradient) is hard or even impossible to compute exactly, which has raisedthe interest in approximation methods. We investigate some popular approachesto compute the hypergradient, based on reverse mode iterative differentiationand approximate implicit differentiation. Under the hypothesis that the fixedpoint equation is defined by a contraction mapping, we present a unifiedanalysis which allows for the first time to quantitatively compare thesemethods, providing explicit bounds for their iteration complexity. Thisanalysis suggests a hierarchy in terms of computational efficiency among theabove methods, with approximate implicit differentiation based on conjugategradient performing best. We present an extensive experimental comparison amongthe methods which confirm the theoretical findings.

Quick Read (beta)

loading the full paper ...