A Corpus for Automatic Readability Assessment and Text Simplification of German

  • 2019-09-19 16:07:32
  • Alessia Battisti, Sarah Ebling
  • 6

Abstract

In this paper, we present a corpus for use in automatic readabilityassessment and automatic text simplification of German. The corpus is compiledfrom web sources and consists of approximately 211,000 sentences. As a novelcontribution, it contains information on text structure, typography, andimages, which can be exploited as part of machine learning approaches toreadability assessment and text simplification. The focus of this publicationis on representing such information as an extension to an existing corpusstandard.

 

Quick Read (beta)

loading the full paper ...