ClimateBert: A Pretrained Language Model for Climate-Related Text

  • 2022-09-26 07:09:21
  • Nicolas Webersinke, Mathias Kraus, Julia Anna Bingler, Markus Leippold
  • 0

Abstract

Over the recent years, large pretrained language models (LM) haverevolutionized the field of natural language processing (NLP). However, whilepretraining on general language has been shown to work very well for commonlanguage, it has been observed that niche language poses problems. Inparticular, climate-related texts include specific language that common LMs cannot represent accurately. We argue that this shortcoming of today's LMs limitsthe applicability of modern NLP to the broad field of text processing ofclimate-related texts. As a remedy, we propose ClimateBert, a transformer-basedlanguage model that is further pretrained on over 1.6 million paragraphs ofclimate-related texts, crawled from various sources such as common news,research articles, and climate reporting of companies. We find thatClimateBertleads to a 46% improvement on a masked language model objectivewhich, in turn, leads to lowering error rates by 3.57% to 35.71% for variousclimate-related downstream tasks like text classification, sentiment analysis,and fact-checking.

 

Quick Read (beta)

loading the full paper ...