Abstract
This paper is dedicated to an efficient compression of weights and optimizerstates (called checkpoints) obtained at different stages during a neuralnetwork training process. First, we propose a prediction-based compressionapproach, where values from the previously saved checkpoint are used forcontext modeling in arithmetic coding. Second, in order to enhance thecompression performance, we also propose to apply pruning and quantization ofthe checkpoint values. Experimental results show that our approach achievessubstantial bit size reduction, while enabling near-lossless training recoveryfrom restored checkpoints, preserving the model's performance and making itsuitable for storage-limited environments.
Quick Read (beta)
loading the full paper ...