The C-index Multiverse

  • 2025-08-20 16:11:10
  • BegoƱa B. Sierra, Colin McLean, Peter S. Hall, Catalina A. Vallejos
  • 0

Abstract

Quantifying out-of-sample discrimination performance for time-to-eventoutcomes is a fundamental step for model evaluation and selection in thecontext of predictive modelling. The concordance index, or C-index, is a widelyused metric for this purpose, particularly with the growing development ofmachine learning methods. Beyond differences between proposed C-indexestimators (e.g. Harrell's, Uno's and Antolini's), we demonstrate the existenceof a C-index multiverse among available R and python software, where seeminglyequal implementations can yield different results. This can underminereproducibility and complicate fair comparisons across models and studies. Keyvariation sources include tie handling and adjustment to censoring.Additionally, the absence of a standardised approach to summarise risk fromsurvival distributions, result in another source of variation dependent oninput types. We demonstrate the consequences of the C-index multiverse whenquantifying predictive performance for several survival models (from Coxproportional hazards to recent deep learning approaches) on publicly availablebreast cancer data, and semi-synthetic examples. Our work emphasises the needfor better reporting to improve transparency and reproducibility. This articleaims to be a useful guideline, helping analysts when navigating the multiverse,providing unified documentation and highlighting potential pitfalls of existingsoftware. All code is publicly available at:www.github.com/BBolosSierra/CindexMultiverse.

 

Quick Read (beta)

loading the full paper ...