Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

  • 2019-09-15 00:38:42
  • Katharina Kann, Kyunghyun Cho, Samuel R. Bowman
  • 0

Abstract

Development sets are impractical to obtain for real low-resource languages,since using all available data for training is often more effective. However,development sets are widely used in research papers that purport to deal withlow-resource natural language processing (NLP). Here, we aim to answer thefollowing questions: Does using a development set for early stopping in thelow-resource setting influence results as compared to a more realisticalternative, where the number of training epochs is tuned on developmentlanguages? And does it lead to overestimation or underestimation ofperformance? We repeat multiple experiments from recent work on neural modelsfor low-resource NLP and compare results for models obtained by training withand without development sets. On average over languages, absolute accuracydiffers by up to 1.4%. However, for some languages and tasks, differences areas big as 18.0% accuracy. Our results highlight the importance of realisticexperimental setups in the publication of low-resource NLP research results.

 

Quick Read (beta)

loading the full paper ...