Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

Abstract

Sinhala is the native language of the Sinhalese people who make up thelargest ethnic group of Sri Lanka. The language belongs to the globe-spanninglanguage tree, Indo-European. However, due to poverty in both linguistic andeconomic capital, Sinhala, in the perspective of Natural Language Processingtools and research, remains a resource-poor language which has neither theeconomic drive its cousin English has nor the sheer push of the law of numbersa language such as Chinese has. A number of research groups from Sri Lanka havenoticed this dearth and the resultant dire need for proper tools and researchfor Sinhala natural language processing. However, due to various reasons, theseattempts seem to lack coordination and awareness of each other. The objectiveof this paper is to fill that gap of a comprehensive literature survey of thepublicly available Sinhala natural language tools and research so that theresearchers working in this field can better utilize contributions of theirpeers. As such, we shall be uploading this paper to arXiv and perpetuallyupdate it periodically to reflect the advances made in the field.

Quick Read (beta)

loading the full paper ...