Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation

Abstract

State-of-the-art neural machine translation models generate outputsautoregressively, where every step conditions on the previously generatedtokens. This sequential nature causes inherent decoding latency.Non-autoregressive translation techniques, on the other hand, parallelizegeneration across positions and speed up inference at the expense oftranslation quality. Much recent effort has been devoted to non-autoregressivemethods, aiming for a better balance between speed and quality. In this work,we re-examine the trade-off and argue that transformer-based autoregressivemodels can be substantially sped up without loss in accuracy. Specifically, westudy autoregressive models with encoders and decoders of varied depths. Ourextensive experiments show that given a sufficiently deep encoder, a one-layerautoregressive decoder yields state-of-the-art accuracy with comparable latencyto strong non-autoregressive models. Our findings suggest that the latencydisadvantage for autoregressive translation has been overestimated due to asuboptimal choice of layer allocation, and we provide a new speed-qualitybaseline for future research toward fast, accurate translation.

Quick Read (beta)

loading the full paper ...