Latin BERT: A Contextual Language Model for Classical Philology

  • 2020-09-21 17:47:44
  • David Bamman, Patrick J. Burns
  • 8

Abstract

We present Latin BERT, a contextual language model for the Latin language,trained on 642.7 million words from a variety of sources spanning the Classicalera to the 21st century. In a series of case studies, we illustrate theaffordances of this language-specific model both for work in natural languageprocessing for Latin and in using computational methods for traditionalscholarship: we show that Latin BERT achieves a new state of the art forpart-of-speech tagging on all three Universal Dependency datasets for Latin andcan be used for predicting missing text (including critical emendations); wecreate a new dataset for assessing word sense disambiguation for Latin anddemonstrate that Latin BERT outperforms static word embeddings; and we showthat it can be used for semantically-informed search by querying contextualnearest neighbors. We publicly release trained models to help drive future workin this space.

 

Quick Read (beta)

loading the full paper ...