Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

  • 2018-12-28 01:21:17
  • Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen
Work on the problem of contextualized word representation -- the developmentof reusable neural network components for sentence understanding -- hasrecently seen a surge of progress centered on the unsupervised pretraining taskof language modeling with methods like ELMo. This paper contributes the firstlarge-scale systematic study comparing different pretraining tasks in thiscontext, both as complements to language modeling and as potentialalternatives. The primary results of the study support the use of languagemodeling as a pretraining task and set a new state of the art among comparablemodels using multitask learning with language models. However, a closer look atthese results reveals worryingly strong baselines and strikingly varied resultsacross target tasks, suggesting that the widely-used paradigm of pretrainingand freezing sentence encoders may not be an ideal platform for further work.


