Restoring ancient text using deep learning: a case study on Greek epigraphy

  • 2019-10-14 16:43:00
  • Yannis Assael, Thea Sommerschield, Jonathan Prag
  • 67

Abstract

Ancient history relies on disciplines such as epigraphy, the study of ancientinscribed texts, for evidence of the recorded past. However, these texts,"inscriptions", are often damaged over the centuries, and illegible parts ofthe text must be restored by specialists, known as epigraphists. This workpresents Pythia, the first ancient text restoration model that recovers missingcharacters from a damaged text input using deep neural networks. Itsarchitecture is carefully designed to handle long-term context information, anddeal efficiently with missing or corrupted character and word representations.To train it, we wrote a non-trivial pipeline to convert PHI, the largestdigital corpus of ancient Greek inscriptions, to machine actionable text, whichwe call PHI-ML. On PHI-ML, Pythia's predictions achieve a 30.1% character errorrate, compared to the 57.3% of human epigraphists. Moreover, in 73.5% of casesthe ground-truth sequence was among the Top-20 hypotheses of Pythia, whicheffectively demonstrates the impact of this assistive method on the field ofdigital epigraphy, and sets the state-of-the-art in ancient text restoration.

 

Quick Read (beta)

loading the full paper ...