Abstract
Whether class labels in a given data set correspond to meaningful clusters iscrucial for the evaluation of clustering algorithms using real-world data sets.This property can be quantified by separability measures. The central aspectsof separability for density-based clustering are between-class separation andwithin-class connectedness, and neither classification-based complexitymeasures nor cluster validity indices (CVIs) adequately incorporate them. Anewly developed measure (density cluster separability index, DCSI) aims toquantify these two characteristics and can also be used as a CVI. Extensiveexperiments on synthetic data indicate that DCSI correlates strongly with theperformance of DBSCAN measured via the adjusted Rand index (ARI) but lacksrobustness when it comes to multi-class data sets with overlapping classes thatare ill-suited for density-based hard clustering. Detailed evaluation onfrequently used real-world data sets shows that DCSI can correctly identifytouching or overlapping classes that do not correspond to meaningfuldensity-based clusters.