Dataset Distillation via Committee Voting

Abstract

Dataset distillation aims to synthesize a smaller, representative datasetthat preserves the essential properties of the original data, enablingefficient model training with reduced computational resources. Prior work hasprimarily focused on improving the alignment or matching process betweenoriginal and synthetic data, or on enhancing the efficiency of distilling largedatasets. In this work, we introduce ${\bf C}$ommittee ${\bf V}$oting for ${\bfD}$ataset ${\bf D}$istillation (CV-DD), a novel and orthogonal approach thatleverages the collective wisdom of multiple models or experts to createhigh-quality distilled datasets. We start by showing how to establish a strongbaseline that already achieves state-of-the-art accuracy through leveragingrecent advancements and thoughtful adjustments in model design and optimizationprocesses. By integrating distributions and predictions from a committee ofmodels while generating high-quality soft labels, our method captures a widerspectrum of data features, reduces model-specific biases and the adverseeffects of distribution shifts, leading to significant improvements ingeneralization. This voting-based strategy not only promotes diversity androbustness within the distilled dataset but also significantly reducesoverfitting, resulting in improved performance on post-eval tasks. Extensiveexperiments across various datasets and IPCs (images per class) demonstratethat Committee Voting leads to more reliable and adaptable distilled datacompared to single/multi-model distillation methods, demonstrating itspotential for efficient and accurate dataset distillation. Code is availableat: https://github.com/Jiacheng8/CV-DD.

Quick Read (beta)

loading the full paper ...