It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Abstract

When scaled to hundreds of billions of parameters, pretrained language modelssuch as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance onchallenging natural language understanding benchmarks. In this work, we showthat performance similar to GPT-3 can be obtained with language models whoseparameter count is several orders of magnitude smaller. This is achieved byconverting textual inputs into cloze questions that contain some form of taskdescription, combined with gradient-based optimization; additionally exploitingunlabeled data gives further improvements. Based on our findings, we identifyseveral key factors required for successful natural language understanding withsmall language models.

Quick Read (beta)

loading the full paper ...