Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Abstract

Many fine-grained classification tasks, like rare animal identification, havelimited training data and consequently classifiers trained on these datasetsoften fail to generalize to variations in the domain like changes in weather orlocation. As such, we explore how natural language descriptions of the domainsseen in training data can be used with large vision models trained on diversepretraining datasets to generate useful variations of the training data. Weintroduce ALIA (Automated Language-guided Image Augmentation), a method whichutilizes large vision and language models to automatically generate naturallanguage descriptions of a dataset's domains and augment the training data vialanguage-guided image editing. To maintain data integrity, a model trained onthe original dataset filters out minimal image edits and those which corruptclass-relevant information. The resulting dataset is visually consistent withthe original training data and offers significantly enhanced diversity. Onfine-grained and cluttered datasets for classification and detection, ALIAsurpasses traditional data augmentation and text-to-image generated data by upto 15\%, often even outperforming equivalent additions of real data. Code isavilable at https://github.com/lisadunlap/ALIA.

Quick Read (beta)

loading the full paper ...