Counterfactual reasoning: an analysis of in-context emergence

  • 2025-10-21 16:08:36
  • Moritz Miller, Bernhard Schölkopf, Siyuan Guo
  • 0

Abstract

Large-scale neural language models exhibit remarkable performance inin-context learning: the ability to learn and reason about the input context onthe fly. This work studies in-context counterfactual reasoning in languagemodels, that is, the ability to predict consequences of a hypotheticalscenario. We focus on a well-defined, synthetic linear regression task thatrequires noise abduction. Accurate prediction is based on (1) inferring anunobserved latent concept and (2) copying contextual noise from factualobservations. We show that language models are capable of counterfactualreasoning. Further, we enhance existing identifiability results and reducecounterfactual reasoning for a broad class of functions to a transformation onin-context observations. In Transformers, we find that self-attention, modeldepth and pre-training data diversity drive performance. Moreover, we providemechanistic evidence that the latent concept is linearly represented in theresidual stream and we introduce designated \textit{noise abduction heads}central to performing counterfactual reasoning. Lastly, our findings extend tocounterfactual reasoning under SDE dynamics and reflect that Transformers canperform noise abduction on sequential data, providing preliminary evidence onthe potential for counterfactual story generation. Our code is available underhttps://github.com/mrtzmllr/iccr.

 

Quick Read (beta)

loading the full paper ...