Analysis and representation of Igbo text document for a text-based system

  • 2020-09-05 19:07:17
  • Ifeanyi-Reuben Nkechi J., Ugwu Chidiebere, Adegbola Tunde
The advancement in Information Technology (IT) has assisted in inculcatingthe three Nigeria major languages in text-based application such as textmining, information retrieval and natural language processing. The interest ofthis paper is the Igbo language, which uses compounding as a common type ofword formation and as well has many vocabularies of compound words. The issuesof collocation, word ordering and compounding play high role in Igbo language.The ambiguity in dealing with these compound words has made the representationof Igbo language text document very difficult because this cannot be addressedusing the most common and standard approach of the Bag-Of-Words (BOW) model oftext representation, which ignores the word order and relation. However, thiscause for a concern and the need to develop an improved model to capture thissituation. This paper presents the analysis of Igbo language text document,considering its compounding nature and describes its representation with theWord-based N-gram model to properly prepare it for any text-based application.The result shows that Bigram and Trigram n-gram text representation modelsprovide more semantic information as well addresses the issues of compounding,word ordering and collocations which are the major language peculiarities inIgbo. They are likely to give better performance when used in any Igbotext-based system.


