Larger language models do in-context learning differently

Abstract

We study how in-context learning (ICL) in language models is affected bysemantic priors versus input-label mappings. We investigate two setups-ICL withflipped labels and ICL with semantically-unrelated labels-across various modelfamilies (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experimentson ICL with flipped labels show that overriding semantic priors is an emergentability of model scale. While small language models ignore flipped labelspresented in-context and thus rely primarily on semantic priors frompretraining, large models can override semantic priors when presented within-context exemplars that contradict priors, despite the stronger semanticpriors that larger models may hold. We next study semantically-unrelated labelICL (SUL-ICL), in which labels are semantically unrelated to their inputs(e.g., foo/bar instead of negative/positive), thereby forcing language modelsto learn the input-label mappings shown in in-context exemplars in order toperform the task. The ability to do SUL-ICL also emerges primarily with scale,and large-enough language models can even perform linear classification in aSUL-ICL setting. Finally, we evaluate instruction-tuned models and find thatinstruction tuning strengthens both the use of semantic priors and the capacityto learn input-label mappings, but more of the former.

Quick Read (beta)

loading the full paper ...