Abstract
Sharpness-aware minimization (SAM) is a recently proposed method thatminimizes the sharpness of the training loss of a neural network. While itsgeneralization improvement is well-known and is the primary motivation, weuncover an additional intriguing effect of SAM: reduction of the feature rankwhich happens at different layers of a neural network. We show that thislow-rank effect occurs very broadly: for different architectures such asfully-connected networks, convolutional networks, vision transformers and fordifferent objectives such as regression, classification, language-imagecontrastive training. To better understand this phenomenon, we provide amechanistic understanding of how low-rank features arise in a simple two-layernetwork. We observe that a significant number of activations gets entirelypruned by SAM which directly contributes to the rank reduction. We confirm thiseffect theoretically and check that it can also occur in deep networks,although the overall rank reduction mechanism can be more complex, especiallyfor deep networks with pre-activation skip connections and self-attentionlayers. We make our code available athttps://github.com/tml-epfl/sam-low-rank-features.