Beyond neural scaling laws: beating power law scaling via data pruning

Abstract

Widely observed neural scaling laws, in which error falls off as a power ofthe training set size, model size, or both, have driven substantial performanceimprovements in deep learning. However, these improvements through scalingalone require considerable costs in compute and energy. Here we focus on thescaling of error with dataset size and show how both in theory and practice wecan break beyond power law scaling and reduce it to exponential scaling insteadif we have access to a high-quality data pruning metric that ranks the order inwhich training examples should be discarded to achieve any pruned dataset size.We then test this new exponential scaling prediction with pruned dataset sizeempirically, and indeed observe better than power law scaling performance onResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance offinding high-quality pruning metrics, we perform the first large-scalebenchmarking study of ten different data pruning metrics on ImageNet. We findmost existing high performing metrics scale poorly to ImageNet, while the bestare computationally intensive and require labels for every image. We thereforedeveloped a new simple, cheap and scalable self-supervised pruning metric thatdemonstrates comparable performance to the best supervised metrics. Overall,our work suggests that the discovery of good data-pruning metrics may provide aviable path forward to substantially improved neural scaling laws, therebyreducing the resource costs of modern deep learning.

Quick Read (beta)

loading the full paper ...