To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

  • 2019-03-14 13:32:31
  • Matthew Peters, Sebastian Ruder, Noah A. Smith
  • 15


While most previous work has focused on different pretraining objectives andarchitectures for transfer learning, we ask how to best adapt the pretrainedmodel to a given target task. We focus on the two most common forms ofadaptation, feature extraction (where the pretrained weights are frozen), anddirectly fine-tuning the pretrained model. Our empirical results across diverseNLP tasks with two state-of-the-art models show that the relative performanceof fine-tuning vs. feature extraction depends on the similarity of thepretraining and target tasks. We explore possible explanations for this findingand provide a set of adaptation guidelines for the NLP practitioner.


Introduction (beta)



Conclusion (beta)