We present multimodal neural posterior estimation (MultiNPE), a method tointegrate heterogeneous data from different sources in simulation-basedinference with neural networks. Inspired by advances in attention-based deepfusion learning, it empowers researchers to analyze data from different domainsand infer the parameters of complex mathematical models with increasedaccuracy. We formulate different multimodal fusion approaches for MultiNPE(early, late, and hybrid) and evaluate their performance in three challengingnumerical experiments. MultiNPE not only outperforms na\"ive baselines on abenchmark model, but also achieves superior inference on representativescientific models from neuroscience and cardiology. In addition, wesystematically investigate the impact of partially missing data on thedifferent fusion strategies. Across our different experiments, late and hybridfusion techniques emerge as the methods of choice for practical applications ofmultimodal simulation-based inference.