Activating More Pixels in Image Super-Resolution Transformer

Abstract

Transformer-based methods have shown impressive performance in low-levelvision tasks, such as image super-resolution. However, we find that thesenetworks can only utilize a limited spatial range of input information throughattribution analysis. This implies that the potential of Transformer is stillnot fully exploited in existing networks. In order to activate more inputpixels for reconstruction, we propose a novel Hybrid Attention Transformer(HAT). It combines channel attention and self-attention schemes, thus makinguse of their complementary advantages. Moreover, to better aggregate thecross-window information, we introduce an overlapping cross-attention module toenhance the interaction between neighboring window features. In the trainingstage, we additionally propose a same-task pre-training strategy to bringfurther improvement. Extensive experiments show the effectiveness of theproposed modules, and the overall method significantly outperforms thestate-of-the-art methods by more than 1dB. Codes and models will be availableat https://github.com/chxy95/HAT.

Quick Read (beta)

loading the full paper ...