Abstract
Effective labeled data collection plays a critical role in developing andfine-tuning robust streaming analytics systems. However, continuously labelingdocuments to filter relevant information poses significant challenges likelimited labeling budget or lack of high-quality labels. There is a need forefficient human-in-the-loop machine learning (HITL-ML) design to improvestreaming analytics systems. One particular HITL- ML approach is online activelearning, which involves iteratively selecting a small set of the mostinformative documents for labeling to enhance the ML model performance. Theperformance of such algorithms can get affected due to human errors inlabeling. To address these challenges, we propose ORIS, a method to performOnline active learning using Reinforcement learning-based Inclusive Sampling ofdocuments for labeling. ORIS aims to create a novel Deep Q-Network-basedstrategy to sample incoming documents that minimize human errors in labelingand enhance the ML model performance. We evaluate the ORIS method on emotionrecognition tasks, and it outperforms traditional baselines in terms of bothhuman labeling performance and the ML model performance.