Abstract
We present UniMIC, a universal multi-modality image compression framework,intending to unify the rate-distortion-perception (RDP) optimization formultiple image codecs simultaneously through excavating cross-modalitygenerative priors. Unlike most existing works that need to design and optimizeimage codecs from scratch, our UniMIC introduces the visual codec repository,which incorporates amounts of representative image codecs and directly usesthem as the basic codecs for various practical applications. Moreover, wepropose multi-grained textual coding, where variable-length content prompt andcompression prompt are designed and encoded to assist the perceptualreconstruction through the multi-modality conditional generation. Inparticular, a universal perception compensator is proposed to improve theperception quality of decoded images from all basic codecs at the decoder sideby reusing text-assisted diffusion priors from stable diffusion. With thecooperation of the above three strategies, our UniMIC achieves a significantimprovement of RDP optimization for different compression codecs, e.g.,traditional and learnable codecs, and different compression costs, e.g.,ultra-low bitrates. The code will be available inhttps://github.com/Amygyx/UniMIC .