How much do language models memorize?

Abstract

We propose a new method for estimating how much a model knows about adatapoint and use it to measure the capacity of modern language models. Priorstudies of language model memorization have struggled to disentanglememorization from generalization. We formally separate memorization into twocomponents: unintended memorization, the information a model contains about aspecific dataset, and generalization, the information a model contains aboutthe true data-generation process. When we completely eliminate generalization,we can compute the total memorization, which provides an estimate of modelcapacity: our measurements estimate that GPT-style models have a capacity ofapproximately 3.6 bits per parameter. We train language models on datasets ofincreasing size and observe that models memorize until their capacity fills, atwhich point "grokking" begins, and unintended memorization decreases as modelsbegin to generalize. We train hundreds of transformer language models rangingfrom $500K$ to $1.5B$ parameters and produce a series of scaling laws relatingmodel capacity and data size to membership inference.

Quick Read (beta)

loading the full paper ...