An Efficient Source Model Selection Framework in Model Databases

Abstract

With the explosive increase of big data, training a Machine Learning (ML)model becomes a computation-intensive workload, which would take days or evenweeks. Thus, reusing an already trained model has received attention, which iscalled transfer learning. Transfer learning avoids training a new model fromscratch by transferring knowledge from a source task to a target task. Existingtransfer learning methods mostly focus on how to improve the performance of thetarget task through a specific source model, and assume that the source modelis given. Although many source models are available, it is difficult for datascientists to select the best source model for the target task manually. Hence,how to efficiently select a suitable source model in a model database for modelreuse is an interesting but unsolved problem. In this paper, we propose SMS, aneffective, efficient, and flexible source model selection framework. SMS iseffective even when the source and target datasets have significantly differentdata labels, and is flexible to support source models with any type ofstructure, and is efficient to avoid any training process. For each sourcemodel, SMS first vectorizes the samples in the target dataset into soft labelsby directly applying this model to the target dataset, then uses Gaussiandistributions to fit for clusters of soft labels, and finally measures thedistinguishing ability of the source model using Gaussian mixture-based metric.Moreover, we present an improved SMS (I-SMS), which decreases the output numberof the source model. I-SMS can significantly reduce the selection time whileretaining the selection performance of SMS. Extensive experiments on a range ofpractical model reuse workloads demonstrate the effectiveness and efficiency ofSMS.

Quick Read (beta)

loading the full paper ...