Abstract
In this paper, we try to understand neural machine translation (NMT) viasimplifying NMT architectures and training encoder-free NMT models. In anencoder-free model, the sums of word embeddings and positional embeddingsrepresent the source. The decoder is a standard Transformer or recurrent neuralnetwork that directly attends to embeddings via attention mechanisms.Experimental results show (1) that the attention mechanism in encoder-freemodels acts as a strong feature extractor, (2) that the word embeddings inencoder-free models are competitive to those in conventional models, (3) thatnon-contextualized source representations lead to a big performance drop, and(4) that encoder-free models have different effects on alignment quality forGerman-English and Chinese-English.