On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm

Abstract

The pretraining-finetuning paradigm has become the prevailing trend in moderndeep learning. In this work, we discover an intriguing linear phenomenon inmodels that are initialized from a common pretrained checkpoint and finetunedon different tasks, termed as Cross-Task Linearity (CTL). Specifically, we showthat if we linearly interpolate the weights of two finetuned models, thefeatures in the weight-interpolated model are often approximately equal to thelinear interpolation of features in two finetuned models at each layer. Weprovide comprehensive empirical evidence supporting that CTL consistentlyoccurs for finetuned models that start from the same pretrained checkpoint. Weconjecture that in the pretraining-finetuning paradigm, neural networksapproximately function as linear maps, mapping from the parameter space to thefeature space. Based on this viewpoint, our study unveils novel insights intoexplaining model merging/editing, particularly by translating operations fromthe parameter space to the feature space. Furthermore, we delve deeper into theroot cause for the emergence of CTL, highlighting the role of pretraining.

Quick Read (beta)

loading the full paper ...