Multitask Prompted Training Enables Zero-Shot Task Generalization

  • 2021-10-15 17:08:57
  • Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Stella Biderman, Leo Gao, Tali Bers, Thomas Wolf, Alexander M. Rush
  • 140

Abstract

Large language models have recently been shown to attain reasonable zero-shotgeneralization on a diverse set of tasks. It has been hypothesized that this isa consequence of implicit multitask learning in language model training. Canzero-shot generalization instead be directly induced by explicit multitasklearning? To test this question at scale, we develop a system for easilymapping general natural language tasks into a human-readable prompted form. Weconvert a large set of supervised datasets, each with multiple prompts usingvarying natural language. These prompted datasets allow for benchmarking theability of a model to perform completely unseen tasks specified in naturallanguage. We fine-tune a pretrained encoder-decoder model on this multitaskmixture covering a wide variety of tasks. The model attains strong zero-shotperformance on several standard datasets, often outperforming models 16x itssize. Further, our approach attains strong performance on a subset of tasksfrom the BIG-Bench benchmark, outperforming models 6x its size. All prompts andtrained models are available at github.com/bigscience-workshop/promptsource/.

 

Quick Read (beta)

loading the full paper ...