Principled Out-of-Distribution Generalization via Simplicity

Abstract

Modern foundation models exhibit remarkable out-of-distribution (OOD)generalization, solving tasks far beyond the support of their training data.However, the theoretical principles underpinning this phenomenon remainelusive. This paper investigates this problem by examining the compositionalgeneralization abilities of diffusion models in image generation. Our analysisreveals that while neural network architectures are expressive enough torepresent a wide range of models -- including many with undesirable behavior onOOD inputs -- the true, generalizable model that aligns with human expectationstypically corresponds to the simplest among those consistent with the trainingdata. Motivated by this observation, we develop a theoretical framework for OODgeneralization via simplicity, quantified using a predefined simplicity metric.We analyze two key regimes: (1) the constant-gap setting, where the true modelis strictly simpler than all spurious alternatives by a fixed gap, and (2) thevanishing-gap setting, where the fixed gap is replaced by a smoothnesscondition ensuring that models close in simplicity to the true model yieldsimilar predictions. For both regimes, we study the regularized maximumlikelihood estimator and establish the first sharp sample complexity guaranteesfor learning the true, generalizable, simple model.

Quick Read (beta)

loading the full paper ...