Two-sample Testing Using Deep Learning

Abstract

We propose a two-sample testing procedure based on learned deep neuralnetwork representations. To this end, we define two test statistics thatperform an asymptotic location test on data samples mapped onto a hidden layer.The tests are consistent and asymptotically control the type-1 error rate.Their test statistics can be evaluated in linear time (in the sample size).Suitable data representations are obtained in a data-driven way, by solving asupervised or unsupervised transfer-learning task on an auxiliary (potentiallydistinct) data set. If no auxiliary data is available, we split the data intotwo chunks: one for learning representations and one for computing the teststatistic. In experiments on audio samples, natural images andthree-dimensional neuroimaging data our tests yield significant decreases intype-2 error rate (up to 35 percentage points) compared to state-of-the-arttwo-sample tests such as kernel-methods and classifier two-sample tests.

Quick Read (beta)

loading the full paper ...