Abstract
Large language models have recently been shown to attain reasonable zero-shotgeneralization on a diverse set of tasks. It has been hypothesized that this isa consequence of implicit multitask learning in language model training. Canzero-shot generalization instead be directly induced by explicit multitasklearning? To test this question at scale, we develop a system for easilymapping general natural language tasks into a human-readable prompted form. Weconvert a large set of supervised datasets, each with multiple prompts usingvarying natural language. These prompted datasets allow for benchmarking theability of a model to perform completely unseen tasks specified in naturallanguage. We fine-tune a pretrained encoder-decoder model on this multitaskmixture covering a wide variety of tasks. The model attains strong zero-shotperformance on several standard datasets, often outperforming models 16x itssize. Further, our approach attains strong performance on a subset of tasksfrom the BIG-Bench benchmark, outperforming models 6x its size. All prompts andtrained models are available at github.com/bigscience-workshop/promptsource/.