Prefix-tree Decoding for Predicting Mass Spectra from Molecules

  • 2023-03-11 17:44:28
  • Samuel Goldman, John Bradshaw, Jiayi Xin, Connor W. Coley
  • 33


Computational predictions of mass spectra from molecules have enabled thediscovery of clinically relevant metabolites. However, such predictive toolsare still limited as they occupy one of two extremes, either operating (a) byfragmenting molecules combinatorially with overly rigid constraints onpotential rearrangements and poor time complexity or (b) by decoding lossy andnonphysical discretized spectra vectors. In this work, we introduce a newintermediate strategy for predicting mass spectra from molecules by treatingmass spectra as sets of chemical formulae, which are themselves multisets ofatoms. After first encoding an input molecular graph, we decode a set ofchemical subformulae, each of which specify a predicted peak in the massspectra, the intensities of which are predicted by a second model. Our keyinsight is to overcome the combinatorial possibilities for chemical subformulaeby decoding the formula set using a prefix tree structure, atom-type byatom-type, representing a general method for ordered multiset decoding. We showpromising empirical results on mass spectra prediction tasks.


