Abstract
Contrastive self-supervised learning based on point-wise comparisons has beenwidely studied for vision tasks. In the visual cortex of the brain, neuronalresponses to distinct stimulus classes are organized into geometric structuresknown as neural manifolds. Accurate classification of stimuli can be achievedby effectively separating these manifolds, akin to solving a packing problem.We introduce Contrastive Learning As Manifold Packing (CLAMP), aself-supervised framework that recasts representation learning as a manifoldpacking problem. CLAMP introduces a loss function inspired by the potentialenergy of short-range repulsive particle systems, such as those encountered inthe physics of simple liquids and jammed packings. In this framework, eachclass consists of sub-manifolds embedding multiple augmented views of a singleimage. The sizes and positions of the sub-manifolds are dynamically optimizedby following the gradient of a packing loss. This approach yields interpretabledynamics in the embedding space that parallel jamming physics, and introducesgeometrically meaningful hyperparameters within the loss function. Under thestandard linear evaluation protocol, which freezes the backbone and trains onlya linear classifier, CLAMP achieves competitive performance withstate-of-the-art self-supervised models. Furthermore, our analysis reveals thatneural manifolds corresponding to different categories emerge naturally and areeffectively separated in the learned representation space, highlighting thepotential of CLAMP to bridge insights from physics, neural science, and machinelearning.