Optimally Segmenting Inputs for NMT Shows Preference for Character-Level Processing

Abstract

Most modern neural machine translation (NMT) systems rely on presegmentedinputs. Segmentation granularity importantly determines the input and outputsequence lengths, hence the modeling depth, and source and target vocabularies,which in turn determine model size, computational costs of softmaxnormalization, and handling of out-of-vocabulary words. However, the currentpractice is to use static, heuristic-based segmentations that are fixed beforeNMT training. This begs the question whether the chosen segmentation is optimalfor the translation task. To overcome suboptimal segmentation choices, wepresent an algorithm for dynamic segmentation based on the AdaptativeComputation Time algorithm (Graves 2016), that is trainable end-to-end anddriven by the NMT objective. In an evaluation on three translation tasks wefound that, given the freedom to navigate between different segmentationlevels, the model prefers to operate on (almost) character level, providingsupport for purely character-level NMT models from a novel angle.

Quick Read (beta)

loading the full paper ...