Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box

  • 2025-07-25 17:41:39
  • Lisa Barros de Andrade e Sousa, Gregor Miller, Ronan Le Gleut, Dominik Thalmeier, Helena Pelin, Marie Piraud
  • 0

Abstract

As machine learning models are increasingly deployed in sensitive applicationareas, the demand for interpretable and trustworthy decision-making hasincreased. Random Forests (RF), despite their widespread use and strongperformance on tabular data, remain difficult to interpret due to theirensemble nature. We present Forest-Guided Clustering (FGC), a model-specificexplainability method that reveals both local and global structure in RFs bygrouping instances according to shared decision paths. FGC produceshuman-interpretable clusters aligned with the model's internal logic andcomputes cluster-specific and global feature importance scores to derivedecision rules underlying RF predictions. FGC accurately recovered latentsubclass structure on a benchmark dataset and outperformed classical clusteringand post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGCuncovered biologically coherent subpopulations, disentangled disease-relevantsignals from confounders, and recovered known and novel gene expressionpatterns. FGC bridges the gap between performance and interpretability byproviding structure-aware insights that go beyond feature-level attribution.

 

Quick Read (beta)

loading the full paper ...