Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

  • 2025-07-28 17:56:34
  • Aditya Ravuri, Neil D. Lawrence
  • 0

Abstract

We propose a probabilistic interpretation of transformers as unrolledinference steps assuming a probabilistic Laplacian Eigenmaps model from theProbDR framework. Our derivation shows that at initialisation, transformersperform "linear" dimensionality reduction. We also show that within thetransformer block, a graph Laplacian term arises from our arguments, ratherthan an attention matrix (which we interpret as an adjacency matrix). Wedemonstrate that simply subtracting the identity from the attention matrix (andthereby taking a graph diffusion step) improves validation performance on alanguage model and a simple vision transformer.

 

Quick Read (beta)

loading the full paper ...