Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic

Abstract

Task arithmetic has recently emerged as a cost-effective and scalableapproach to edit pre-trained models directly in weight space, by adding thefine-tuned weights of different tasks. The performance has been furtherimproved by a linear property which is illustrated by weight disentanglement.Yet, conventional linearization methods (e.g., NTK linearization) not onlydouble the time and training cost but also have a disadvantage on single-taskperformance. We propose a simple yet effective and efficient method that onlyfine-tunes linear layers, which improves weight disentanglement and efficiencysimultaneously. Specifically, our study reveals that only fine-tuning thelinear layers in the attention modules makes the whole model occur in a linearregime, significantly improving weight disentanglement. To further understandhow our method improves the disentanglement of task arithmetic, we present acomprehensive study of task arithmetic by differentiating the role ofrepresentation model and task-specific model. In particular, we find that therepresentation model plays an important role in improving weightdisentanglement whereas the task-specific models such as the classificationheads can degenerate the weight disentanglement performance. Overall, our workuncovers novel insights into the fundamental mechanisms of task arithmetic andoffers a more reliable and effective approach to editing pre-trained models.

Quick Read (beta)

loading the full paper ...