Abstract
Machine/deep learning models have been widely adopted for predicting theconfiguration performance of software systems. However, a crucial yetunaddressed challenge is how to cater for the sparsity inherited from theconfiguration landscape: the influence of configuration options (features) andthe distribution of data samples are highly sparse. In this paper, we propose amodel-agnostic and sparsity-robust framework for predicting configurationperformance, dubbed DaL, based on the new paradigm of dividable learning thatbuilds a model via "divide-and-learn". To handle sample sparsity, the samplesfrom the configuration landscape are divided into distant divisions, for eachof which we build a sparse local model, e.g., regularized HierarchicalInteraction Neural Network, to deal with the feature sparsity. A newly givenconfiguration would then be assigned to the right model of division for thefinal prediction. Further, DaL adaptively determines the optimal number ofdivisions required for a system and sample size without any extra training orprofiling. Experiment results from 12 real-world systems and five sets oftraining data reveal that, compared with the state-of-the-art approaches, DaLperforms no worse than the best counterpart on 44 out of 60 cases with up to1.61x improvement on accuracy; requires fewer samples to reach the same/betteraccuracy; and producing acceptable training overhead. In particular, themechanism that adapted the parameter d can reach the optimal value for 76.43%of the individual runs. The result also confirms that the paradigm of dividablelearning is more suitable than other similar paradigms such as ensemblelearning for predicting configuration performance. Practically, DaLconsiderably improves different global models when using them as the underlyinglocal models, which further strengthens its flexibility.