Neural Weight Compression for Language Models

Abstract

Efficient storage and transmission of language model weights are increasingly critical as model scale and deployment grow. Yet, most existing compression methods rely on handcrafted transforms and heuristics, reflecting the limited understanding of weights as a data modality. This motivates a shift toward learning-based paradigm, where compression schemes are optimized directly from data rather than manually designed. In this work, we take a step in this direction by formulating weight compression as a neural codec learning. We propose Neural Weight Compression (NWC), a flexible framework for training neural codecs on pretrained weight datasets. NWC addresses challenges intrinsic to weight compression, such as tensor shape heterogeneity and the misalignment between training losses and downstream performance, through components such as chunk-and-normalize preprocessing and an importance-aware training objective. Experiments show that NWC achieves state-of-the-art accuracy-compression tradeoffs, particularly at 4--6 bit regime, without relying on rigid handcrafted components such as the Hadamard transform. These gains extend across diverse architectures, e.g., vision encoders. Our analysis further supports the learning-based perspective, highlighting the roles of entropy-constrained quantization in high rate regime and learned transforms in adapting to downstream tasks.

Quick Read (beta)

loading the full paper ...