Non-Vacuous Generalization Bounds for Large Language Models

Abstract

Modern language models can contain billions of parameters, raising thequestion of whether they can generalize beyond the training data or simplyparrot their training corpora. We provide the first non-vacuous generalizationbounds for pretrained large language models (LLMs), indicating that languagemodels are capable of discovering regularities that generalize to unseen data.In particular, we derive a compression bound that is valid for the unboundedlog-likelihood loss using prediction smoothing, and we extend the bound tohandle subsampling, accelerating bound computation by orders of magnitude onmassive datasets. To achieve the extreme level of compression required fornon-vacuous bounds, we devise SubLoRA, a simple low-dimensional nonlinearparameterization that leads to non-vacuous generalization bounds for modelswith nearly a billion parameters. Finally, we use our bounds to understand LLMgeneralization and find that larger models have better generalization boundsand are more compressible than smaller models.

Quick Read (beta)

loading the full paper ...