Abstract
The lack of a commonly used benchmark data set (collection) such as(Super-)GLUE (Wang et al., 2018, 2019) for the evaluation of non-Englishpre-trained language models is a severe shortcoming of current English-centricNLP-research. It concentrates a large part of the research on English,neglecting the uncertainty when transferring conclusions found for the Englishlanguage to other languages. We evaluate the performance of the German andmultilingual BERT-based models currently available via the huggingfacetransformers library on the four tasks of the GermEval17 workshop. We comparethem to pre-BERT architectures (Wojatzki et al., 2017; Schmitt et al., 2018;Attia et al., 2018) as well as to an ELMo-based architecture (Biesialska etal., 2020) and a BERT-based approach (Guhr et al., 2020). The observedimprovements are put in relation to those for similar tasks and similar models(pre-BERT vs. BERT-based) for the English language in order to draw tentativeconclusions about whether the observed improvements are transferable to Germanor potentially other related languages.