Unsupervised Data Augmentation for Consistency Training

Abstract

Despite much success, deep learning generally does not perform well withsmall labeled training sets. In these scenarios, data augmentation has shownmuch promise in alleviating the need for more labeled data, but it so far hasmostly been applied in supervised settings and achieved limited gains. In thiswork, we propose to apply data augmentation to unlabeled data in asemi-supervised learning setting. Our method, named Unsupervised DataAugmentation or UDA, encourages the model predictions to be consistent betweenan unlabeled example and an augmented unlabeled example. Unlike previousmethods that use random noise such as Gaussian noise or dropout noise, UDA hasa small twist in that it makes use of harder and more realistic noise generatedby state-of-the-art data augmentation methods. This small twist leads tosubstantial improvements on six language tasks and three vision tasks even whenthe labeled set is extremely small. For example, on the IMDb textclassification dataset, with only 20 labeled examples, UDA achieves an errorrate of 4.20, outperforming the state-of-the-art model trained on 25,000labeled examples. On standard semi-supervised learning benchmarks CIFAR-10 andSVHN, UDA outperforms all previous approaches and achieves an error rate of2.7% on CIFAR-10 with only 4,000 examples and an error rate of 2.85% on SVHNwith only 250 examples, nearly matching the performance of models trained onthe full sets which are one or two orders of magnitude larger. UDA also workswell on large-scale datasets such as ImageNet. When trained with 10% of thelabeled set, UDA improves the top-1/top-5 accuracy from 55.1/77.3% to68.7/88.5%. For the full ImageNet with 1.3M extra unlabeled data, UDA furtherpushes the performance from 78.3/94.4% to 79.0/94.5%.

Quick Read (beta)

loading the full paper ...