Scalable and consistent embedding of probability measures into Hilbert spaces via measure quantization

Abstract

This paper is focused on statistical learning from data that come asprobability measures. In this setting, popular approaches consist in embeddingsuch data into a Hilbert space with either Linearized Optimal Transport orKernel Mean Embedding. However, the cost of computing such embeddings prohibitstheir direct use in large-scale settings. We study two methods based on measurequantization for approximating input probability measures with discretemeasures of small-support size. The first one is based on optimal quantizationof each input measure, while the second one relies on mean-measurequantization. We study the consistency of such approximations, and itsimplication for scalable embeddings of probability measures into a Hilbertspace at a low computational cost. We finally illustrate our findings withvarious numerical experiments.

Quick Read (beta)

loading the full paper ...