Abstract
Semi-supervised learning on class-imbalanced data, although a realisticproblem, has been under studied. While existing semi-supervised learning (SSL)methods are known to perform poorly on minority classes, we find that theystill generate high precision pseudo-labels on minority classes. By exploitingthis property, in this work, we propose Class-Rebalancing Self-Training(CReST), a simple yet effective framework to improve existing SSL methods onclass-imbalanced data. CReST iteratively retrains a baseline SSL model with alabeled set expanded by adding pseudo-labeled samples from an unlabeled set,where pseudo-labeled samples from minority classes are selected more frequentlyaccording to an estimated class distribution. We also propose a progressivedistribution alignment to adaptively adjust the rebalancing strength dubbedCReST+. We show that CReST and CReST+ improve state-of-the-art SSL algorithmson various class-imbalanced datasets and consistently outperform other popularrebalancing methods. Code has been made available athttps://github.com/google-research/crest.