Quantifying Exposure Bias for Neural Language Generation

Abstract

The exposure bias problem refers to the training-generation discrepancy,caused by teacher forcing, in maximum likelihood estimation (MLE) training forauto-regressive neural network language models (LM). It has been regarded as acentral problem for neural language generation (NLG) model training. Although alot of algorithms have been proposed to avoid teacher forcing and therefore` toalleviate exposure bias, there is little work showing how serious the exposurebias problem actually is. In this work, we first identify the self-recoveryability of MLE-trained LM, which casts doubt on the seriousness of exposurebias. We then propose sequence-level (EB-bleu) and word-level (EB-C) metrics toquantify the impact of exposure bias. We conduct experiments for theLSTM/transformer model, in both real and synthetic settings. In addition to theunconditional NLG task, we also include results for a seq2seq machinetranslation task. Surprisingly, all our measurements indicate that removing thetraining-generation discrepancy only brings very little performance gain. Inour analysis, we hypothesise that although there exist a mismatch between themodel distribution and the data distribution, the mismatch is still in themodel's "comfortable zone", and is not big enough to induce significantperformance loss.

Quick Read (beta)

loading the full paper ...