Fast Decoding in Sequence Models using Discrete Latent Variables

Abstract

Autoregressive sequence models based on deep neural networks, such as RNNs,Wavenet and the Transformer attain state-of-the-art results on many tasks.However, they are difficult to parallelize and are thus slow at processing longsequences. RNNs lack parallelism both during training and decoding, whilearchitectures like WaveNet and Transformer are much more parallelizable duringtraining, yet still operate sequentially during decoding. Inspired by [arxiv:1711.00937], we present a method to extend sequence modelsusing discrete latent variables that makes decoding much more parallelizable.We first auto-encode the target sequence into a shorter sequence of discretelatent variables, which at inference time is generated autoregressively, andfinally decode the output sequence from this shorter latent sequence inparallel. To this end, we introduce a novel method for constructing a sequenceof discrete latent variables and compare it with previously introduced methods.Finally, we evaluate our model end-to-end on the task of neural machinetranslation, where it is an order of magnitude faster at decoding thancomparable autoregressive models. While lower in BLEU than purelyautoregressive models, our model achieves higher scores than previouslyproposed non-autogregressive translation models.

Quick Read (beta)

loading the full paper ...