Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

  • 2019-03-14 14:56:06
  • Wouter Kool, Herke van Hoof, Max Welling
  • 11


The well-known Gumbel-Max trick for sampling from a categorical distributioncan be extended to sample $k$ elements without replacement. We show how toimplicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution oversequences, allowing to draw exact samples without replacement using aStochastic Beam Search. Even for exponentially large domains, the number ofmodel evaluations grows only linear in $k$ and the maximum sampled sequencelength. The algorithm creates a theoretical connection between sampling and(deterministic) beam search and can be used as a principled intermediatealternative. In a translation task, the proposed method compares favourablyagainst alternatives to obtain diverse yet good quality translations. We showthat sequences sampled without replacement can be used to constructlow-variance estimators for expected sentence-level BLEU score and modelentropy.


Introduction (beta)



Conclusion (beta)