BPE-Dropout: Simple and Effective Subword Regularization

Abstract

Subword segmentation is widely used to address the open vocabulary problem inmachine translation. The dominant approach to subword segmentation is Byte PairEncoding (BPE), which keeps the most frequent words intact while splitting therare ones into multiple tokens. While multiple segmentations are possible evenwith the same vocabulary, BPE splits words into unique sequences; this mayprevent a model from better learning the compositionality of words and beingrobust to segmentation errors. So far, the only way to overcome this BPEimperfection, its deterministic nature, was to create another subwordsegmentation algorithm (Kudo, 2018). In contrast, we show that BPE itselfincorporates the ability to produce multiple segmentations of the same word. Weintroduce BPE-dropout - simple and effective subword regularization methodbased on and compatible with conventional BPE. It stochastically corrupts thesegmentation procedure of BPE, which leads to producing multiple segmentationswithin the same fixed BPE framework. Using BPE-dropout during training and thestandard BPE during inference improves translation quality up to 3 BLEUcompared to BPE and up to 0.9 BLEU compared to the previous subwordregularization.

Quick Read (beta)

loading the full paper ...