Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent

  • 2025-08-21 05:09:30
  • Yixin Gao, Xin Li, Xiaohan Pan, Runsen Feng, Bingchen Li, Yunpeng Qi, Yiting Lu, Zhengxue Cheng, Zhibo Chen, Jörn Ostermann
  • 0

Abstract

We present Comp-X, the first intelligently interactive image compressionparadigm empowered by the impressive reasoning capability of large languagemodel (LLM) agent. Notably, commonly used image codecs usually suffer fromlimited coding modes and rely on manual mode selection by engineers, makingthem unfriendly for unprofessional users. To overcome this, we advance theevolution of image coding paradigm by introducing three key innovations: (i)multi-functional coding framework, which unifies different coding modes ofvarious objective/requirements, including human-machine perception, variablecoding, and spatial bit allocation, into one framework. (ii) interactive codingagent, where we propose an augmented in-context learning method with codingexpert feedback to teach the LLM agent how to understand the coding request,mode selection, and the use of the coding tools. (iii) IIC-bench, the firstdedicated benchmark comprising diverse user requests and the correspondingannotations from coding experts, which is systematically designed forintelligently interactive image compression evaluation. Extensive experimentalresults demonstrate that our proposed Comp-X can understand the coding requestsefficiently and achieve impressive textual interaction capability. Meanwhile,it can maintain comparable compression performance even with a single codingframework, providing a promising avenue for artificial general intelligence(AGI) in image compression.

 

Quick Read (beta)

loading the full paper ...