Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

  • 2024-05-10 13:57:50
  • Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy
  • 0


Named Entity Recognition (NER) is a useful component in Natural LanguageProcessing (NLP) applications. It is used in various tasks such as MachineTranslation, Summarization, Information Retrieval, and Question-Answeringsystems. The research on NER is centered around English and some other majorlanguages, whereas limited attention has been given to Indian languages. Weanalyze the challenges and propose techniques that can be tailored forMultilingual Named Entity Recognition for Indian Languages. We present a humanannotated named entity corpora of 40K sentences for 4 Indian languages from twoof the major Indian language families. Additionally,we present a multilingualmodel fine-tuned on our dataset, which achieves an F1 score of 0.80 on ourdataset on average. We achieve comparable performance on completely unseenbenchmark datasets for Indian languages which affirms the usability of ourmodel.


