Improving Consistency-Based Semi-Supervised Learning with Weight Averaging

Abstract

Recent advances in deep unsupervised learning have renewed interest insemi-supervised methods, which can learn from both labeled and unlabeled data.Presently the most successful approaches to semi-supervised learning are basedon consistency regularization, whereby a model is trained to be robust to smallperturbations of its inputs and parameters. We show that consistencyregularization leads to flatter but narrower optima. We also show that the testerror surface for these methods is approximately convex in regions of weightspace traversed by SGD. Inspired by these observations, we propose to trainconsistency based semi-supervised models with stochastic weight averaging(SWA), a recent method which averages weights along the trajectory of SGD. Wealso develop fast-SWA, which further accelerates convergence by averagingmultiple points within each cycle of a cyclical learning rate schedule. Withfast-SWA we achieve the best known semi-supervised results on CIFAR-10 andCIFAR-100 over many different numbers of observed training labels. For example,we achieve 5.0% error on CIFAR-10 with only 4000 labels, compared to 6.28% ofthe previous best result in the literature. We also improve the best knownresult from 80% accuracy to 83% for domain adaptation from CIFAR-10 to STL.Finally, we show that with fast-SWA the simple $\Pi$ model becomesstate-of-the-art for large labeled settings.

Quick Read (beta)

loading the full paper ...