DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs

Abstract

Large language models (LLMs) excel in various tasks but face deploymentchallenges due to hardware constraints. We propose density-aware post-trainingweight-only quantization (DAQ), which has two stages: 1) density-centricalignment, which identifies the center of high-density weights and centers thedynamic range on this point to align high-density weight regions withfloating-point high-precision regions; 2) learnable dynamic range adjustment,which adjusts the dynamic range by optimizing quantization parameters (i.e.,scale and zero-point) based on the impact of weights on the model output.Experiments on LLaMA and LLaMA-2 show that DAQ consistently outperforms thebest baseline method, reducing perplexity loss by an average of 22.8% on LLaMAand 19.6% on LLaMA-2. Our code is available athttps://github.com/LuoYingSong/DAQ.

Quick Read (beta)

loading the full paper ...