You May Not Need Attention

  • 2018-10-31 17:09:37
  • Ofir Press, Noah A. Smith
  • 65

Abstract

In NMT, how far can we get without attention and without separate encodingand decoding? To answer that question, we introduce a recurrent neuraltranslation model that does not use attention and does not have a separateencoder and decoder. Our eager translation model is low-latency, writing targettokens as soon as it reads the first source token, and uses constant memoryduring decoding. It performs on par with the standard attention-based model ofBahdanau et al. (2014), and better on long sentences.

 

Quick Read (beta)

loading the full paper ...