Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Abstract

Image quality assessment (IQA) focuses on the perceptual visual quality ofimages, playing a crucial role in downstream tasks such as imagereconstruction, compression, and generation. The rapid advancement ofmulti-modal large language models (MLLMs) has significantly broadened the scopeof IQA, moving toward comprehensive image quality understanding thatincorporates content analysis, degradation perception, and comparison reasoningbeyond mere numerical scoring. Previous MLLM-based methods typically eithergenerate numerical scores lacking interpretability or heavily rely onsupervised fine-tuning (SFT) using large-scale annotated datasets to providedescriptive assessments, limiting their flexibility and applicability. In thispaper, we propose Q-Insight, a reinforcement learning-based model built upongroup relative policy optimization (GRPO), which demonstrates strong visualreasoning capability for image quality understanding while requiring only alimited amount of rating scores and degradation labels. By jointly optimizingscore regression and degradation perception tasks with carefully designedreward functions, our approach effectively exploits their mutual benefits forenhanced performance. Extensive experiments demonstrate that Q-Insightsubstantially outperforms existing state-of-the-art methods in both scoreregression and degradation perception tasks, while exhibiting impressivezero-shot generalization to comparison reasoning tasks. Code will be availableat https://github.com/lwq20020127/Q-Insight.

Quick Read (beta)

loading the full paper ...