An Information-Theoretic Analysis of In-Context Learning

  • 2024-01-28 00:36:44
  • Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy
  • 0

Abstract

Previous theoretical results pertaining to meta-learning on sequences buildon contrived assumptions and are somewhat convoluted. We introduce newinformation-theoretic tools that lead to an elegant and very generaldecomposition of error into three components: irreducible error, meta-learningerror, and intra-task error. These tools unify analyses across manymeta-learning challenges. To illustrate, we apply them to establish new resultsabout in-context learning with transformers. Our theoretical resultscharacterizes how error decays in both the number of training sequences andsequence lengths. Our results are very general; for example, they avoidcontrived mixing time assumptions made by all prior results that establishdecay of error with sequence length.

 

Quick Read (beta)

loading the full paper ...