On the comparability of Pre-trained Language Models

  • 2020-01-03 10:53:35
  • Matthias A├čenmacher, Christian Heumann
  • 7

Abstract

Recent developments in unsupervised representation learning have successfullyestablished the concept of transfer learning in NLP. Mainly three forces aredriving the improvements in this area of research: More elaboratedarchitectures are making better use of contextual information. Instead ofsimply plugging in static pre-trained representations, these are learned basedon surrounding context in end-to-end trainable models with more intelligentlydesigned language modelling objectives. Along with this, larger corpora areused as resources for pre-training large language models in a self-supervisedfashion which are afterwards fine-tuned on supervised tasks. Advances inparallel computing as well as in cloud computing, made it possible to trainthese models with growing capacities in the same or even in shorter time thanpreviously established models. These three developments agglomerate in newstate-of-the-art (SOTA) results being revealed in a higher and higherfrequency. It is not always obvious where these improvements originate from, asit is not possible to completely disentangle the contributions of the threedriving forces. We set ourselves to providing a clear and concise overview onseveral large pre-trained language models, which achieved SOTA results in thelast two years, with respect to their use of new architectures and resources.We want to clarify for the reader where the differences between the models areand we furthermore attempt to gain some insight into the single contributionsof lexical/computational improvements as well as of architectural changes. Weexplicitly do not intend to quantify these contributions, but rather see ourwork as an overview in order to identify potential starting points forbenchmark comparisons. Furthermore, we tentatively want to point at potentialpossibilities for improvement in the field of open-sourcing and reproducibleresearch.

 

Quick Read (beta)

loading the full paper ...