Token-level and sequence-level loss smoothing for RNN language models

Abstract

Despite the effectiveness of recurrent neural network language models, theirmaximum likelihood estimation suffers from two limitations. It treats allsentences that do not match the ground truth as equally poor, ignoring thestructure of the output space. Second, it suffers from "exposure bias": duringtraining tokens are predicted given ground-truth sequences, while at test timeprediction is conditioned on generated output sequences. To overcome theselimitations we build upon the recent reward augmented maximum likelihoodapproach \ie sequence-level smoothing that encourages the model to predictsentences close to the ground truth according to a given performance metric. Weextend this approach to token-level loss smoothing, and propose improvements tothe sequence-level smoothing approach. Our experiments on two different tasks,image captioning and machine translation, show that token-level andsequence-level loss smoothing are complementary, and significantly improveresults.

Quick Read (beta)

loading the full paper ...