Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

Abstract

Since Bahdanau et al. [1] first introduced attention for neural machinetranslation, most sequence-to-sequence models made use of attention mechanisms[2, 3, 4]. While they produce soft-alignment matrices that could be interpretedas alignment between target and source languages, we lack metrics to quantifytheir quality, being unclear which approach produces the best alignments. Thispaper presents an empirical evaluation of 3 main sequence-to-sequence models(CNN, RNN and Transformer-based) for word discovery from unsegmented phonemesequences. This task consists in aligning word sequences in a source languagewith phoneme sequences in a target language, inferring from it wordsegmentation on the target side [5]. Evaluating word segmentation quality canbe seen as an extrinsic evaluation of the soft-alignment matrices producedduring training. Our experiments in a low-resource scenario on Mboshi andEnglish languages (both aligned to French) show that RNNs surprisinglyoutperform CNNs and Transformer for this task. Our results are confirmed by anintrinsic evaluation of alignment quality through the use of Average NormalizedEntropy (ANE). Lastly, we improve our best word discovery model by using analignment entropy confidence measure that accumulates ANE over all theoccurrences of a given alignment pair in the collection.

Quick Read (beta)

loading the full paper ...