Estimating the effective dimension of large biological datasets using Fisher separability analysis

  • 2019-01-18 16:32:11
  • Luca Albergante, Jonathan Bac, Andrei Zinovyev
  • 5

Abstract

Modern large-scale datasets are frequently said to be high-dimensional.However, their data point clouds frequently possess structures, significantlydecreasing their intrinsic dimensionality (ID) due to the presence of clusters,points being located close to low-dimensional varieties or fine-grainedlumping. We test a recently introduced dimensionality estimator, based onanalysing the separability properties of data points, on several benchmarks andreal biological datasets. We show that the introduced measure of ID hasperformance competitive with state-of-the-art measures, being efficient acrossa wide range of dimensions and performing better in the case of noisy samples.Moreover, it allows estimating the intrinsic dimension in situations where theintrinsic manifold assumption is not valid.

 

Quick Read (beta)

loading the full paper ...