Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

  • 2019-06-09 18:00:27
  • Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen
  • 0

Abstract

Natural language understanding has recently seen a surge of progress with theuse of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin etal., 2019) which are pretrained on variants of language modeling. We conductthe first large-scale systematic study of candidate pretraining tasks,comparing 19 different tasks both as alternatives and complements to languagemodeling. Our primary results support the use language modeling, especiallywhen combined with pretraining on additional labeled-data tasks. However, ourresults are mixed across pretraining tasks and show some concerning trends: InELMo's pretrain-then-freeze paradigm, random baselines are worryingly strongand results vary strikingly across target tasks. In addition, fine-tuning BERTon an intermediate task often negatively impacts downstream transfer. In a morepositive trend, we see modest gains from multitask training, suggesting thedevelopment of more sophisticated multitask and transfer learning techniques asan avenue for further research.

 

Quick Read (beta)

loading the full paper ...