OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models

Abstract

Pathology image classification plays a crucial role in accurate medicaldiagnosis and treatment planning. Training high-performance models for thistask typically requires large-scale annotated datasets, which are bothexpensive and time-consuming to acquire. Active Learning (AL) offers a solutionby iteratively selecting the most informative samples for annotation, therebyreducing the labeling effort. However, most AL methods are designed under theassumption of a closed-set scenario, where all the unannotated images belong totarget classes. In real-world clinical environments, the unlabeled pool oftencontains a substantial amount of Out-Of-Distribution (OOD) data, leading to lowefficiency of annotation in traditional AL methods. Furthermore, most existingAL methods start with random selection in the first query round, leading to asignificant waste of labeling costs in open-set scenarios. To address thesechallenges, we propose OpenPath, a novel open-set active learning approach forpathological image classification leveraging a pre-trained Vision-LanguageModel (VLM). In the first query, we propose task-specific prompts that combinetarget and relevant non-target class prompts to effectively selectIn-Distribution (ID) and informative samples from the unlabeled pool. Insubsequent queries, Diverse Informative ID Sampling (DIS) that includesPrototype-based ID candidate Selection (PIS) and Entropy-Guided StochasticSampling (EGSS) is proposed to ensure both purity and informativeness in aquery, avoiding the selection of OOD samples. Experiments on two publicpathology image datasets show that OpenPath significantly enhances the model'sperformance due to its high purity of selected samples, and outperforms severalstate-of-the-art open-set AL methods. The code is available at\href{https://github.com/HiLab-git/OpenPath}{https://github.com/HiLab-git/OpenPath}..

Quick Read (beta)

loading the full paper ...