Abstract
Successive Halving is a popular algorithm for hyperparameter optimizationwhich allocates exponentially more resources to promising candidates. However,the algorithm typically relies on intermediate performance values to makeresource allocation decisions, which can cause it to prematurely prune slowstarters that would eventually become the best candidate. We investigatewhether guiding Successive Halving with learning curve predictions based onLatent Kronecker Gaussian Processes can overcome this limitation. In alarge-scale empirical study involving different neural network architecturesand a click prediction dataset, we compare this predictive approach to thestandard approach based on current performance values. Our experiments showthat, although the predictive approach achieves competitive performance, it isnot Pareto optimal compared to investing more resources into the standardapproach, because it requires fully observed learning curves as training data.However, this downside could be mitigated by leveraging existing learning curvedata.