Comparing the information content of probabilistic representation spaces

Abstract

Probabilistic representation spaces convey information about a dataset, andto understand the effects of factors such as training loss and networkarchitecture, we seek to compare the information content of such spaces.However, most existing methods to compare representation spaces assumerepresentations are points, and neglect the distributional nature ofprobabilistic representations. Here, instead of building upon point-basedmeasures of comparison, we build upon classic methods from literature on hardclustering. We generalize two information-theoretic methods of comparing hardclustering assignments to be applicable to general probabilistic representationspaces. We then propose a practical method of estimation that is based onfingerprinting a representation space with a sample of the dataset and isapplicable when the communicated information is only a handful of bits. Withunsupervised disentanglement as a motivating problem, we find informationfragments that are repeatedly contained in individual latent dimensions in VAEand InfoGAN ensembles. Then, by comparing the full latent spaces of models, wefind highly consistent information content across datasets, methods, andhyperparameters, even though there is often a point during training withsubstantial variety across repeat runs. Finally, we leverage thedifferentiability of the proposed method and perform model fusion bysynthesizing the information content of multiple weak learners, each incapableof representing the global structure of a dataset. Across the case studies, thedirect comparison of information content provides a natural basis forunderstanding the processing of information.

Quick Read (beta)

loading the full paper ...