iNLTK: Natural Language Toolkit for Indic Languages

Abstract

We present iNLTK, an open-source NLP library consisting of pre-trainedlanguage models and out-of-the-box support for Paraphrase Generation, TextualSimilarity, Sentence Embeddings, Word Embeddings, Tokenization and TextGeneration in 13 Indic Languages. By using pre-trained models from iNLTK fortext classification on publicly available datasets, we significantly outperformpreviously reported results. On these datasets, we also show that by usingpre-trained models and paraphrases from iNLTK, we can achieve more than 95% ofthe previous best performance by using less than 10% of the training data.iNLTK is already being widely used by the community and has 40,000+ downloads,600+ stars and 100+ forks on GitHub. The library is available athttps://github.com/goru001/inltk.

Quick Read (beta)

loading the full paper ...