Biclustering with Alternating K-Means

Abstract

Biclustering is the task of simultaneously clustering the rows and columns ofthe data matrix into different subgroups such that the rows and columns withina subgroup exhibit similar patterns. In this paper, we consider the case ofproducing block-diagonal biclusters. We provide a new formulation of thebiclustering problem based on the idea of minimizing the empirical clusteringrisk. We develop and prove a consistency result with respect to the empiricalclustering risk. Since the optimization problem is combinatorial in nature,finding the global minimum is computationally intractable. In light of thisfact, we propose a simple and novel algorithm that finds a local minimum byalternating the use of an adapted version of the k-means clustering algorithmbetween columns and rows. We evaluate and compare the performance of ouralgorithm to other related biclustering methods on both simulated data andreal-world gene expression data sets. The results demonstrate that ouralgorithm is able to detect meaningful structures in the data and outperformother competing biclustering methods in various settings and situations.

Quick Read (beta)

loading the full paper ...