Sharpness-Aware Minimization for Efficiently Improving Generalization

Abstract

In today's heavily overparameterized models, the value of the training lossprovides few guarantees on model generalization ability. Indeed, optimizingonly the training loss value, as is commonly done, can easily lead tosuboptimal model quality. Motivated by the connection between geometry of theloss landscape and generalization -- including a generalization bound that weprove here -- we introduce a novel, effective procedure for insteadsimultaneously minimizing loss value and loss sharpness. In particular, ourprocedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie inneighborhoods having uniformly low loss; this formulation results in a min-maxoptimization problem on which gradient descent can be performed efficiently. Wepresent empirical results showing that SAM improves model generalization acrossa variety of benchmark datasets (e.g., CIFAR-{10, 100}, ImageNet, finetuningtasks) and models, yielding novel state-of-the-art performance for several.Additionally, we find that SAM natively provides robustness to label noise onpar with that provided by state-of-the-art procedures that specifically targetlearning with noisy labels.

Quick Read (beta)

loading the full paper ...