Abstract
At the center of the underlying issues that halt Indonesian natural languageprocessing (NLP) research advancement, we find data scarcity. Resources inIndonesian languages, especially the local ones, are extremely scarce andunderrepresented. Many Indonesian researchers do not publish their dataset.Furthermore, the few public datasets that we have are scattered acrossdifferent platforms, thus makes performing reproducible and data-centricresearch in Indonesian NLP even more arduous. Rising to this challenge, weinitiate the first Indonesian NLP crowdsourcing effort, NusaCrowd. NusaCrowdstrives to provide the largest datasheets aggregation with standardized dataloading for NLP tasks in all Indonesian languages. By enabling open andcentralized access to Indonesian NLP resources, we hope NusaCrowd can tacklethe data scarcity problem hindering NLP progress in Indonesia and bring NLPpractitioners to move towards collaboration.