TOAST: Transfer Learning via Attention Steering

Abstract

Transfer learning involves adapting a pre-trained model to novel downstreamtasks. However, we observe that current transfer learning methods often fail tofocus on task-relevant features. In this work, we explore refocusing modelattention for transfer learning. We introduce Top-Down Attention Steering(TOAST), a novel transfer learning algorithm that keeps the pre-trainedbackbone frozen, selects task-relevant features in the output, and feeds thosefeatures back to the model to steer the attention to the task-specificfeatures. By refocusing the attention only, TOAST achieves state-of-the-artresults on a number of transfer learning benchmarks, while having a smallnumber of tunable parameters. Compared to fully fine-tuning, LoRA, and prompttuning, TOAST substantially improves performance across a range of fine-grainedvisual classification datasets (e.g., 81.1% -> 86.2% on FGVC). TOAST alsooutperforms the fully fine-tuned Alpaca and Vicuna models oninstruction-following language generation. Code is available athttps://github.com/bfshi/TOAST.

Quick Read (beta)

loading the full paper ...