Rethinking Annotation: Can Language Learners Contribute?

Abstract

Researchers have traditionally recruited native speakers to provideannotations for widely used benchmark datasets. However, there are languagesfor which recruiting native speakers can be difficult, and it would help tofind learners of those languages to annotate the data. In this paper, weinvestigate whether language learners can contribute annotations to benchmarkdatasets. In a carefully controlled annotation experiment, we recruit 36language learners, provide two types of additional resources (dictionaries andmachine-translated sentences), and perform mini-tests to measure their languageproficiency. We target three languages, English, Korean, and Indonesian, andthe four NLP tasks of sentiment analysis, natural language inference, namedentity recognition, and machine reading comprehension. We find that languagelearners, especially those with intermediate or advanced levels of languageproficiency, are able to provide fairly accurate labels with the help ofadditional resources. Moreover, we show that data annotation improves learners'language proficiency in terms of vocabulary and grammar. One implication of ourfindings is that broadening the annotation task to include language learnerscan open up the opportunity to build benchmark datasets for languages for whichit is difficult to recruit native speakers.

Quick Read (beta)

loading the full paper ...