Abstract
The well-known Gumbel-Max trick for sampling from a categorical distributioncan be extended to sample $k$ elements without replacement. We show how toimplicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution oversequences, allowing to draw exact samples without replacement using aStochastic Beam Search. Even for exponentially large domains, the number ofmodel evaluations grows only linear in $k$ and the maximum sampled sequencelength. The algorithm creates a theoretical connection between sampling and(deterministic) beam search and can be used as a principled intermediatealternative. In a translation task, the proposed method compares favourablyagainst alternatives to obtain diverse yet good quality translations. We showthat sequences sampled without replacement can be used to constructlow-variance estimators for expected sentence-level BLEU score and modelentropy.