On Accurate Evaluation of GANs for Language Generation

  • 2018-06-14 12:25:16
  • Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly
  • 0

Abstract

Generative Adversarial Networks (GANs) are a promising approach to languagegeneration. The latest works introducing novel GAN models for languagegeneration use n-gram based metrics for evaluation and only report singlescores of the best run. In this paper, we argue that this often misrepresentsthe true picture and does not tell the full story, as GAN models can beextremely sensitive to the random initialization and small deviations from thebest hyperparameter choice. In particular, we demonstrate that the previouslyused BLEU score is not sensitive to semantic deterioration of generated textsand propose alternative metrics that better capture the quality and diversityof the generated samples. We also conduct a set of experiments comparing anumber of GAN models for text with a conventional Language Model (LM) and findthat neither of the considered models performs convincingly better than the LM.

 

Quick Read (beta)

loading the full paper ...