ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

  • 2021-11-22 02:34:46
  • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
  • 27


Despite the recent success of multi-task learning and transfer learning fornatural language processing (NLP), few works have systematically studied theeffect of scaling up the number of tasks during pre-training. Towards thisgoal, this paper introduces ExMix (Extreme Mixture): a massive collection of107 supervised NLP tasks across diverse domains and task-families. Using ExMix,we study the effect of multi-task pre-training at the largest scale to date,and analyze co-training transfer amongst common families of tasks. Through thisanalysis, we show that manually curating an ideal set of tasks for multi-taskpre-training is not straightforward, and that multi-task scaling can vastlyimprove models on its own. Finally, we propose ExT5: a model pre-trained usinga multi-task objective of self-supervised span denoising and supervised ExMix.Via extensive experiments, we show that ExT5 outperforms strong T5 baselines onSuperGLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside ofExMix. ExT5 also significantly improves sample efficiency while pre-training.


