Abstract
Work in deep clustering focuses on finding a single partition of data.However, high-dimensional data, such as images, typically feature multipleinteresting characteristics one could cluster over. For example, images ofobjects against a background could be clustered over the shape of the objectand separately by the colour of the background. In this paper, we introduceMulti-Facet Clustering Variational Autoencoders (MFCVAE), a novel class ofvariational autoencoders with a hierarchy of latent variables, each with aMixture-of-Gaussians prior, that learns multiple clusterings simultaneously,and is trained fully unsupervised and end-to-end. MFCVAE uses aprogressively-trained ladder architecture which leads to highly stableperformance. We provide novel theoretical results for optimising the ELBOanalytically with respect to the categorical variational posteriordistribution, and corrects earlier influential theoretical work. On imagebenchmarks, we demonstrate that our approach separates out and clusters overdifferent aspects of the data in a disentangled manner. We also show otheradvantages of our model: the compositionality of its latent space and that itprovides controlled generation of samples.