What to Pre-Train on? Efficient Intermediate Task Selection

Abstract

Intermediate task fine-tuning has been shown to culminate in large transfergains across many NLP tasks. With an abundance of candidate datasets as well aspre-trained language models, it has become infeasible to run the cross-productof all combinations to find the best transfer setting. In this work we firstestablish that similar sequential fine-tuning gains can be achieved in adaptersettings, and subsequently consolidate previously proposed methods thatefficiently identify beneficial tasks for intermediate transfer learning. Weexperiment with a diverse set of 42 intermediate and 11 target Englishclassification, multiple choice, question answering, and sequence taggingtasks. Our results show that efficient embedding based methods that rely solelyon the respective datasets outperform computational expensive few-shotfine-tuning approaches. Our best methods achieve an average Regret@3 of lessthan 1% across all target tasks, demonstrating that we are able to efficientlyidentify the best datasets for intermediate training.

Quick Read (beta)

loading the full paper ...