Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

Abstract

While in-context learning (ICL) has achieved remarkable success in naturallanguage and vision domains, its theoretical understanding--particularly in thecontext of structured geometric data--remains unexplored. In this work, weinitiate a theoretical study of ICL for regression of H\"older functions onmanifolds. By establishing a novel connection between the attention mechanismand classical kernel methods, we derive generalization error bounds in terms ofthe prompt length and the number of training tasks. When a sufficient number oftraining tasks are observed, transformers give rise to the minimax regressionrate of H\"older functions on manifolds, which scales exponentially with theintrinsic dimension of the manifold, rather than the ambient space dimension.Our result also characterizes how the generalization error scales with thenumber of training tasks, shedding light on the complexity of transformers asin-context algorithm learners. Our findings provide foundational insights intothe role of geometry in ICL and novels tools to study ICL of nonlinear models.

Quick Read (beta)

loading the full paper ...