Abstract
Structural biology relies on accurate three-dimensional biomolecularstructures to advance our understanding of biological functions, diseasemechanisms, and therapeutics. While recent advances in deep learning haveenabled the development of all-atom foundation models for molecular modelingand generation, existing approaches face challenges in generalization due tothe multi-modal nature of atomic data and the lack of comprehensive analysis oftraining and sampling strategies. To address these limitations, we proposePharMolixFM, a unified framework for constructing all-atom foundation modelsbased on multi-modal generative techniques. Our framework includes threevariants using state-of-the-art multi-modal generative models. By formulatingmolecular tasks as a generalized denoising process with task-specific priors,PharMolixFM achieves robust performance across various structural biologyapplications. Experimental results demonstrate that PharMolixFM-Diff achievescompetitive prediction accuracy in protein-small-molecule docking (83.9% vs.90.2% RMSD < 2{\AA}, given pocket) with significantly improved inference speed.Moreover, we explore the empirical inference scaling law by introducing moresampling repeats or steps. Our code and model are available athttps://github.com/PharMolix/OpenBioMed.