Abstract
In mixture models, nonspherical (anisotropic) noise within each cluster iswidely present in real-world data. We study both the minimax rate and optimalstatistical procedure for clustering under high-dimensional nonsphericalmixture models. In high-dimensional settings, we first establish theinformation-theoretic limits for clustering under Gaussian mixtures. Theminimax lower bound unveils an intriguing informational dimension-reductionphenomenon: there exists a substantial gap between the minimax rate and theoracle clustering risk, with the former determined solely by the projectedcenters and projected covariance matrices in a low-dimensional space. Motivatedby the lower bound, we propose a novel computationally efficient clusteringmethod: Covariance Projected Spectral Clustering (COPO). Its key step is toproject the high-dimensional data onto the low-dimensional space spanned by thecluster centers and then use the projected covariance matrices in this space toenhance clustering. We establish tight algorithmic upper bounds for COPO, bothfor Gaussian noise with flexible covariance and general noise with localdependence. Our theory indicates the minimax-optimality of COPO in the Gaussiancase and highlights its adaptivity to a broad spectrum of dependent noise.Extensive simulation studies under various noise structures and real dataanalysis demonstrate our method's superior performance.