ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus

  • 2021-07-15 08:23:08
  • Ayyoob Imani, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw, Hinrich Schütze
  • 0

Abstract

With more than 7000 languages worldwide, multilingual natural languageprocessing (NLP) is essential both from an academic and commercial perspective.Researching typological properties of languages is fundamental for progress inmultilingual NLP. Examples include assessing language similarity for effectivetransfer learning, injecting inductive biases into machine learning models orcreating resources such as dictionaries and inflection tables. We provideParCourE, an online tool that allows to browse a word-aligned parallel corpus,covering 1334 languages. We give evidence that this is useful for typologicalresearch. ParCourE can be set up for any parallel corpus and can thus be usedfor typological research on other corpora as well as for exploring theirquality and properties.

 

Quick Read (beta)

loading the full paper ...