Communication and Memory Efficient Testing of Discrete Distributions

Abstract

We study distribution testing with communication and memory constraints inthe following computational models: (1) The {\em one-pass streaming model}where the goal is to minimize the sample complexity of the protocol subject toa memory constraint, and (2) A {\em distributed model} where the data samplesreside at multiple machines and the goal is to minimize the communication costof the protocol. In both these models, we provide efficient algorithms foruniformity/identity testing (goodness of fit) and closeness testing (two sampletesting). Moreover, we show nearly-tight lower bounds on (1) the samplecomplexity of any one-pass streaming tester for uniformity, subject to thememory constraint, and (2) the communication cost of any uniformity testingprotocol, in a restricted `one-pass' model of communication.

Quick Read (beta)

loading the full paper ...