Abstract
This paper is focused on statistical learning from data that come asprobability measures. In this setting, popular approaches consist in embeddingsuch data into a Hilbert space with either Linearized Optimal Transport orKernel Mean Embedding. However, the cost of computing such embeddings prohibitstheir direct use in large-scale settings. We study two methods based on measurequantization for approximating input probability measures with discretemeasures of small-support size. The first one is based on optimal quantizationof each input measure, while the second one relies on mean-measurequantization. We study the consistency of such approximations, and itsimplication for scalable embeddings of probability measures into a Hilbertspace at a low computational cost. We finally illustrate our findings withvarious numerical experiments.