Exchangeable modelling of relational data: checking sparsity, train-test splitting, and sparse exchangeable Poisson matrix factorization

Abstract

A variety of machine learning tasks---e.g., matrix factorization, topicmodelling, and feature allocation---can be viewed as learning the parameters ofa probability distribution over bipartite graphs. Recently, a new class ofmodels for networks, the sparse exchangeable graphs, have been introduced toresolve some important pathologies of traditional approaches to statisticalnetwork modelling; most notably, the inability to model sparsity (in theasymptotic sense). The present paper explains some practical insights arisingfrom this work. We first show how to check if sparsity is relevant formodelling a given (fixed size) dataset by using network subsampling to identifya simple signature of sparsity. We discuss the implications of the (sparse)exchangeable subsampling theory for test-train dataset splitting; we arguecommon approaches can lead to biased results, and we propose a principledalternative. Finally, we study sparse exchangeable Poisson matrix factorizationas a worked example. In particular, we show how to adapt mean field variationalinference to the sparse exchangeable setting, allowing us to scale inference tohuge datasets.

Quick Read (beta)

loading the full paper ...