Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

  • 2025-04-01 18:10:17
  • Sully F. Chen, Robert J. Steele, Glen M. Hocky, Beakal Lemeneh, Shivanand P. Lad, Eric K. Oermann
  • 0

Abstract

The transformer architecture has revolutionized bioinformatics and drivenprogress in the understanding and prediction of the properties of biomolecules.Almost all research on large-scale biosequence transformers has focused on onedomain at a time (single-omic), usually DNA/RNA or proteins. These models haveseen incredible success in downstream tasks in each domain, and have achievedparticularly noteworthy breakthroughs in sequence modeling and structuralmodeling. However, these single-omic models are naturally incapable ofefficiently modeling multi-omic tasks, one of the most biologically criticalbeing protein-nucleic acid interactions. We present our work training thelargest open-source multi-omic foundation model to date. We show that thesemulti-omic models (MOMs) can learn joint representations between varioussingle-omic distributions that are emergently consistent with the Central Dogmaof molecular biology despite only being trained on unlabeled biosequences. Wefurther demonstrate that MOMs can be fine-tuned to achieve state-of-the-artresults on protein-nucleic acid interaction tasks, namely predicting the changein Gibbs free energy ($\Delta G$) of the binding interaction between a givennucleic acid and protein. Remarkably, we show that multi-omic biosequencetransformers emergently learn useful structural information without any\textit{a priori} structural training, allowing us to predict which proteinresidues are most involved in the protein-nucleic acid binding interaction.Lastly, we provide evidence that multi-omic biosequence models are in manycases superior to foundation models trained on single-omics distributions, bothin performance-per-FLOP and absolute performance, suggesting a more generalizedor foundational approach to building these models for biology.

 

Quick Read (beta)

loading the full paper ...