How much do language models memorize?

Abstract

We propose a new method for estimating how much a model ``knows'' about adatapoint and use it to measure the capacity of modern language models. Priorstudies of language model memorization have struggled to disentanglememorization from generalization. We formally separate memorization into twocomponents: \textit{unintended memorization}, the information a model containsabout a specific dataset, and \textit{generalization}, the information a modelcontains about the true data-generation process. When we completely eliminategeneralization, we can compute the total memorization, which provides anestimate of model capacity: our measurements estimate that GPT-style modelshave a capacity of approximately 3.6 bits per parameter. We train languagemodels on datasets of increasing size and observe that models memorize untiltheir capacity fills, at which point ``grokking'' begins, and unintendedmemorization decreases as models begin to generalize. We train hundreds oftransformer language models ranging from $500K$ to $1.5B$ parameters andproduce a series of scaling laws relating model capacity and data size tomembership inference.

Quick Read (beta)

loading the full paper ...