On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions

Abstract

The Gumbel-Softmax probability distribution allows learning discrete tokensin generative learning, while the Gumbel-Argmax probability distribution isuseful in learning discrete structures in discriminative learning. Despite theefforts invested in optimizing these probability models, their statisticalproperties are under-explored. In this work, we investigate theirrepresentation properties and determine for which families of parameters theseprobability distributions are complete, i.e., can represent any probabilitydistribution, and minimal, i.e., can represent a probability distributionuniquely. We rely on convexity and differentiability to determine thesestatistical conditions and extend this framework to general probability models,such as Gaussian-Softmax and Gaussian-Argmax. We experimentally validate thequalities of these extensions, which enjoy a faster convergence rate. Weconclude the analysis by identifying two sets of parameters that satisfy theseassumptions and thus admit a complete and minimal representation. Ourcontribution is theoretical with supporting practical evaluation.

Quick Read (beta)

loading the full paper ...