Can sparse autoencoders make sense of latent representations?

  • 2025-02-03 18:20:35
  • Viktoria Schuster
  • 0

Abstract

Sparse autoencoders (SAEs) have lately been used to uncover interpretablelatent features in large language models. Here, we explore their potential fordecomposing latent representations in complex and high-dimensional biologicaldata, where the underlying variables are often unknown. Using simulated data,we find that latent representations can encode observable and directlyconnected upstream hidden variables in superposition. The degree to which theyare learned depends on the type of variable and the model architecture,favoring shallow and wide networks. Superpositions, however, are notidentifiable if the generative variables are unknown. SAEs can recover thesevariables and their structure with respect to the observables. Applied tosingle-cell multi-omics data, we show that SAEs can uncover key biologicalprocesses. We further present an automated method for linking SAE features tobiological concepts to enable large-scale analysis of single-cell expressionmodels.

 

Quick Read (beta)

loading the full paper ...