Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Abstract

In-context learning (ICL) is a powerful ability that emerges in transformermodels, enabling them to learn from context without weight updates. Recent workhas established emergent ICL as a transient phenomenon that can sometimesdisappear after long training times. In this work, we sought a mechanisticunderstanding of these transient dynamics. Firstly, we find that, after thedisappearance of ICL, the asymptotic strategy is a remarkable hybrid betweenin-weights and in-context learning, which we term "context-constrainedin-weights learning" (CIWL). CIWL is in competition with ICL, and eventuallyreplaces it as the dominant strategy of the model (thus leading to ICLtransience). However, we also find that the two competing strategies actuallyshare sub-circuits, which gives rise to cooperative dynamics as well. Forexample, in our setup, ICL is unable to emerge quickly on its own, and can onlybe enabled through the simultaneous slow development of asymptotic CIWL. CIWLthus both cooperates and competes with ICL, a phenomenon we term "strategycoopetition." We propose a minimal mathematical model that reproduces these keydynamics and interactions. Informed by this model, we were able to identify asetup where ICL is truly emergent and persistent.

Quick Read (beta)

loading the full paper ...