Abstract
Although recent diffusion-based single-step super-resolution methods achievebetter performance as compared to SinSR, they are computationally complex. Toimprove the performance of SinSR, we investigate preserving the high-frequencydetail features during super-resolution (SR) because the downgraded images lackdetailed information. For this purpose, we introduce a high-frequencyperceptual loss by utilizing an invertible neural network (INN) pretrained onthe ImageNet dataset. Different feature maps of pretrained INN producedifferent high-frequency aspects of an image. During the training phase, weimpose to preserve the high-frequency features of super-resolved and groundtruth (GT) images that improve the SR image quality during inference.Furthermore, we also utilize the Jenson-Shannon divergence between GT and SRimages in the pretrained DINO-v2 embedding space to match their distribution.By introducing the $\textbf{h}igh$- $\textbf{f}requency$ preserving loss anddistribution matching constraint in the single-step $\textbf{diff}usion-based$SR ($\textbf{HF-Diff}$), we achieve a state-of-the-art CLIPIQA score in thebenchmark RealSR, RealSet65, DIV2K-Val, and ImageNet datasets. Furthermore, theexperimental results in several datasets demonstrate that our high-frequencyperceptual loss yields better SR image quality than LPIPS and VGG-basedperceptual losses. Our code will be released athttps://github.com/shoaib-sami/HF-Diff.