Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

  • 2025-11-03 12:24:52
  • Yaxin Hou, Bo Han, Yuheng Jia, Hui Liu, Junhui Hou
  • 0

Abstract

Current long-tailed semi-supervised learning methods assume that labeled dataexhibit a long-tailed distribution, and unlabeled data adhere to a typicalpredefined distribution (i.e., long-tailed, uniform, or inverse long-tailed).However, the distribution of the unlabeled data is generally unknown and mayfollow an arbitrary distribution. To tackle this challenge, we propose aControllable Pseudo-label Generation (CPG) framework, expanding the labeleddataset with the progressively identified reliable pseudo-labels from theunlabeled dataset and training the model on the updated labeled dataset with aknown distribution, making it unaffected by the unlabeled data distribution.Specifically, CPG operates through a controllable self-reinforcing optimizationcycle: (i) at each training step, our dynamic controllable filtering mechanismselectively incorporates reliable pseudo-labels from the unlabeled dataset intothe labeled dataset, ensuring that the updated labeled dataset follows a knowndistribution; (ii) we then construct a Bayes-optimal classifier using logitadjustment based on the updated labeled data distribution; (iii) this improvedclassifier subsequently helps identify more reliable pseudo-labels in the nexttraining step. We further theoretically prove that this optimization cycle cansignificantly reduce the generalization error under some conditions.Additionally, we propose a class-aware adaptive augmentation module to furtherimprove the representation of minority classes, and an auxiliary branch tomaximize data utilization by leveraging all labeled and unlabeled samples.Comprehensive evaluations on various commonly used benchmark datasets show thatCPG achieves consistent improvements, surpassing state-of-the-art methods by upto $\textbf{15.97%}$ in accuracy. The code is available athttps://github.com/yaxinhou/CPG.

 

Quick Read (beta)

loading the full paper ...