The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shotperformance solely by leveraging a natural-language prompt and a few taskdemonstrations as input context. Inspired by their findings, we study few-shotlearning in a more practical scenario, where we use smaller language models forwhich fine-tuning is computationally efficient. We present LM-BFF--betterfew-shot fine-tuning of language models--a suite of simple and complementarytechniques for fine-tuning language models on a small number of annotatedexamples. Our approach includes (1) prompt-based fine-tuning together with anovel pipeline for automating prompt generation; and (2) a refined strategy fordynamically and selectively incorporating demonstrations into each context.Finally, we present a systematic evaluation for analyzing few-shot performanceon a range of NLP tasks, including classification and regression. Ourexperiments demonstrate that our methods combine to dramatically outperformstandard fine-tuning procedures in this low resource setting, achieving up to30% absolute improvement, and 11% on average across all tasks. Our approachmakes minimal assumptions on task resources and domain expertise, and henceconstitutes a strong task-agnostic method for few-shot learning.