Aligned Cross Entropy for Non-Autoregressive Machine Translation

  • 2020-04-03 16:24:47
  • Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy
  • 11

Abstract

Non-autoregressive machine translation models significantly speed up decodingby allowing for parallel prediction of the entire target sequence. However,modeling word order is more challenging due to the lack of autoregressivefactors in the model. This difficultly is compounded during training with crossentropy loss, which can highly penalize small shifts in word order. In thispaper, we propose aligned cross entropy (AXE) as an alternative loss functionfor training of non-autoregressive models. AXE uses a differentiable dynamicprogram to assign loss based on the best possible monotonic alignment betweentarget tokens and model predictions. AXE-based training of conditional maskedlanguage models (CMLMs) substantially improves performance on major WMTbenchmarks, while setting a new state of the art for non-autoregressive models.

 

Quick Read (beta)

loading the full paper ...