Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging

  • 2018-08-13 13:44:22
  • Apostolos Kemos, Heike Adel, Hinrich Schütze
  • 13

Abstract

Character-level models of tokens have been shown to be effective at dealingwith within-token noise and out-of-vocabulary words. But these models stillrely on correct token boundaries. In this paper, we propose a novel end-to-endcharacter-level model and demonstrate its effectiveness in multilingualsettings and when token boundaries are noisy. Our model is a semi-Markovconditional random field with neural networks for character and segmentrepresentation. It requires no tokenizer. The model matches state-of-the-artbaselines for various languages and significantly outperforms them on a noisyEnglish version of a part-of-speech tagging benchmark dataset.

 

Quick Read (beta)

loading the full paper ...