Abstract
Image quality assessment (IQA) is an important research topic forunderstanding and improving visual experience. The current state-of-the-art IQAmethods are based on convolutional neural networks (CNNs). The performance ofCNN-based models is often compromised by the fixed shape constraint in batchtraining. To accommodate this, the input images are usually resized and croppedto a fixed shape, causing image quality degradation. To address this, we designa multi-scale image quality Transformer (MUSIQ) to process native resolutionimages with varying sizes and aspect ratios. With a multi-scale imagerepresentation, our proposed method can capture image quality at differentgranularities. Furthermore, a novel hash-based 2D spatial embedding and a scaleembedding is proposed to support the positional embedding in the multi-scalerepresentation. Experimental results verify that our method can achievestate-of-the-art performance on multiple large scale IQA datasets such asPaQ-2-PiQ, SPAQ and KonIQ-10k.