Challenges in Developing LRs for Non-Scheduled Languages: A Case of Magahi

  • 2021-11-30 12:07:23
  • Ritesh Kumar
  • 1

Abstract

Magahi is an Indo-Aryan Language, spoken mainly in the Eastern parts ofIndia. Despite having a significant number of speakers, there has beenvirtually no language resource (LR) or language technology (LT) developed forthe language, mainly because of its status as a non-scheduled language. Thepresent paper describes an attempt to develop an annotated corpus of Magahi.The data is mainly taken from a couple of blogs in Magahi, some collection ofstories in Magahi and the recordings of conversation in Magahi and it isannotated at the POS level using BIS tagset.

 

Quick Read (beta)

loading the full paper ...