A Feature-Rich Vietnamese Named-Entity Recognition Model

  • 2018-03-12 17:07:40
  • Pham Quang Nhat Minh
In this paper, we present a feature-based named-entity recognition (NER)model that achieves the start-of-the-art accuracy for Vietnamese language. Wecombine word, word-shape features, PoS, chunk, Brown-cluster-based features,and word-embedding-based features in the Conditional Random Fields (CRF) model.We also explore the effects of word segmentation, PoS tagging, and chunkingresults of many popular Vietnamese NLP toolkits on the accuracy of the proposedfeature-based NER model. Up to now, our work is the first work thatsystematically performs an extrinsic evaluation of basic Vietnamese NLPtoolkits on the downstream NER task. Experimental results show that whileautomatically-generated word segmentation is useful, PoS and chunkinginformation generated by Vietnamese NLP tools does not show their benefits forthe proposed feature-based NER model.


