Abstract
We present streaming self-training (SST) that aims to democratize the processof learning visual recognition models such that a non-expert user can define anew task depending on their needs via a few labeled examples and minimal domainknowledge. Key to SST are two crucial observations: (1) domain-agnosticunlabeled images enable us to learn better models with a few labeled exampleswithout any additional knowledge or supervision; and (2) learning is acontinuous process and can be done by constructing a schedule of learningupdates that iterates between pre-training on novel segments of the streams ofunlabeled data, and fine-tuning on the small and fixed labeled dataset. Thisallows SST to overcome the need for a large number of domain-specific labeledand unlabeled examples, exorbitant computational resources, anddomain/task-specific knowledge. In this setting, classical semi-supervisedapproaches require a large amount of domain-specific labeled and unlabeledexamples, immense resources to process data, and expert knowledge of aparticular task. Due to these reasons, semi-supervised learning has beenrestricted to a few places that can house required computational and humanresources. In this work, we overcome these challenges and demonstrate ourfindings for a wide range of visual recognition tasks including fine-grainedimage classification, surface normal estimation, and semantic segmentation. Wealso demonstrate our findings for diverse domains including medical, satellite,and agricultural imagery, where there does not exist a large amount of labeledor unlabeled data.