Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

Abstract

Transfer-learning methods aim to improve performance in a data-scarce targetdomain using a model pretrained on a data-rich source domain. A cost-efficientstrategy, linear probing, involves freezing the source model and training a newclassification head for the target domain. This strategy is outperformed by amore costly but state-of-the-art method -- fine-tuning all parameters of thesource model to the target domain -- possibly because fine-tuning allows themodel to leverage useful information from intermediate layers which isotherwise discarded by the later pretrained layers. We explore the hypothesisthat these intermediate layers might be directly exploited. We propose amethod, Head-to-Toe probing (Head2Toe), that selects features from all layersof the source model to train a classification head for the target-domain. Inevaluations on the VTAB-1k, Head2Toe matches performance obtained withfine-tuning on average while reducing training and storage cost hundred foldsor more, but critically, for out-of-distribution transfer, Head2Toe outperformsfine-tuning.

Quick Read (beta)

loading the full paper ...