CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

  • 2025-11-03 18:09:02
  • Ningyuan Huang, Richard Stiskalek, Jun-Young Lee, Adrian E. Bayer, Charles C. Margossian, Christian Kragh Jespersen, Lucia A. Perez, Lawrence K. Saul, Francisco Villaescusa-Navarro
  • 0

Abstract

Cosmological simulations provide a wealth of data in the form of point cloudsand directed trees. A crucial goal is to extract insights from this data thatshed light on the nature and composition of the Universe. In this paper weintroduce CosmoBench, a benchmark dataset curated from state-of-the-artcosmological simulations whose runs required more than 41 million core-hoursand generated over two petabytes of data. CosmoBench is the largest dataset ofits kind: it contains 34 thousand point clouds from simulations of dark matterhalos and galaxies at three different length scales, as well as 25 thousanddirected trees that record the formation history of halos on two different timescales. The data in CosmoBench can be used for multiple tasks -- to predictcosmological parameters from point clouds and merger trees, to predict thevelocities of individual halos and galaxies from their collective positions,and to reconstruct merger trees on finer time scales from those on coarser timescales. We provide several baselines on these tasks, some based on establishedapproaches from cosmological modeling and others rooted in machine learning.For the latter, we study different approaches -- from simple linear models thatare minimally constrained by symmetries to much larger and morecomputationally-demanding models in deep learning, such as graph neuralnetworks. We find that least-squares fits with a handful of invariant featuressometimes outperform deep architectures with many more parameters and farlonger training time. Still there remains tremendous potential to improve thesebaselines by combining machine learning and cosmology to fully exploit thedata. CosmoBench sets the stage for bridging cosmology and geometric deeplearning at scale. We invite the community to push the frontier of scientificdiscovery by engaging with this dataset, available athttps://cosmobench.streamlit.app

 

Quick Read (beta)

loading the full paper ...