LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

  • 2018-08-10 20:46:32
  • Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka
  • 6

Abstract

We present LemmaTag, a featureless recurrent neural network architecture thatjointly generates part-of-speech tags and lemmatizes sentences of languageswith complex morphology, using bidirectional RNNs with character-level andword-level embeddings. We demonstrate that both tasks benefit from sharing theencoding part of the network and from using the tagger output as an input tothe lemmatizer. We evaluate our model across several morphologically-richlanguages, surpassing state-of-the-art accuracy in both part-of-speech taggingand lemmatization in Czech, German, and Arabic.

 

Quick Read (beta)

loading the full paper ...