DCSI -- An improved measure of cluster separability based on separation and connectedness

  • 2025-04-10 14:55:36
  • Jana Gauss, Fabian Scheipl, Moritz Herrmann
  • 0

Abstract

Whether class labels in a given data set correspond to meaningful clusters iscrucial for the evaluation of clustering algorithms using real-world data sets.This property can be quantified by separability measures. The central aspectsof separability for density-based clustering are between-class separation andwithin-class connectedness, and neither classification-based complexitymeasures nor cluster validity indices (CVIs) adequately incorporate them. Anewly developed measure (density cluster separability index, DCSI) aims toquantify these two characteristics and can also be used as a CVI. Extensiveexperiments on synthetic data indicate that DCSI correlates strongly with theperformance of DBSCAN measured via the adjusted Rand index (ARI) but lacksrobustness when it comes to multi-class data sets with overlapping classes thatare ill-suited for density-based hard clustering. Detailed evaluation onfrequently used real-world data sets shows that DCSI can correctly identifytouching or overlapping classes that do not correspond to meaningfuldensity-based clusters.

 

Quick Read (beta)

loading the full paper ...