Toward Effective Automated Content Analysis via Crowdsourcing

Abstract

Many computer scientists use the aggregated answers of online workers torepresent ground truth. Prior work has shown that aggregation methods such asmajority voting are effective for measuring relatively objective features. Forsubjective features such as semantic connotation, online workers, known foroptimizing their hourly earnings, tend to deteriorate in the quality of theirresponses as they work longer. In this paper, we aim to address this issue byproposing a quality-aware semantic data annotation system. We observe that withtimely feedback on workers' performance quantified by quality scores, betterinformed online workers can maintain the quality of their labeling throughoutan extended period of time. We validate the effectiveness of the proposedannotation system through i) evaluating performance based on an expert-labeleddataset, and ii) demonstrating machine learning tasks that can lead toconsistent learning behavior with 70%-80% accuracy. Our results suggest thatwith our system, researchers can collect high-quality answers of subjectivesemantic features at a large scale.

Quick Read (beta)

loading the full paper ...