Implicit differentiation of Lasso-type models for hyperparameter optimization

Abstract

Setting regularization parameters for Lasso-type estimators is notoriouslydifficult, though crucial in practice. The most popular hyperparameteroptimization approach is grid-search using held-out validation data.Grid-search however requires to choose a predefined grid for each parameter,which scales exponentially in the number of parameters. Another approach is tocast hyperparameter optimization as a bi-level optimization problem, one cansolve by gradient descent. The key challenge for these methods is theestimation of the gradient with respect to the hyperparameters. Computing thisgradient via forward or backward automatic differentiation is possible yetusually suffers from high memory consumption. Alternatively implicitdifferentiation typically involves solving a linear system which can beprohibitive and numerically unstable in high dimension. In addition, implicitdifferentiation usually assumes smooth loss functions, which is not the casefor Lasso-type problems. This work introduces an efficient implicitdifferentiation algorithm, without matrix inversion, tailored for Lasso-typeproblems. Our approach scales to high-dimensional data by leveraging thesparsity of the solutions. Experiments demonstrate that the proposed methodoutperforms a large number of standard methods to optimize the error onheld-out data, or the Stein Unbiased Risk Estimator (SURE).

Quick Read (beta)

loading the full paper ...