MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration

Abstract

Recently, Transformer networks have demonstrated outstanding performance inthe field of image restoration due to the global receptive field andadaptability to input. However, the quadratic computational complexity ofSoftmax-attention poses a significant limitation on its extensive applicationin image restoration tasks, particularly for high-resolution images. To tacklethis challenge, we propose a novel variant of the Transformer. This variantleverages the Taylor expansion to approximate the Softmax-attention andutilizes the concept of norm-preserving mapping to approximate the remainder ofthe first-order Taylor expansion, resulting in a linear computationalcomplexity. Moreover, we introduce a multi-branch architecture featuringmulti-scale patch embedding into the proposed Transformer, which has fourdistinct advantages: 1) various sizes of the receptive field; 2) multi-levelsemantic information; 3) flexible shapes of the receptive field; 4) acceleratedtraining and inference speed. Hence, the proposed model, named the secondversion of Taylor formula expansion-based Transformer (for shortMB-TaylorFormer V2) has the capability to concurrently process coarse-to-finefeatures, capture long-distance pixel interactions with limited computationalcost, and improve the approximation of the Taylor expansion remainder.Experimental results across diverse image restoration benchmarks demonstratethat MB-TaylorFormer V2 achieves state-of-the-art performance in multiple imagerestoration tasks, such as image dehazing, deraining, desnowing, motiondeblurring, and denoising, with very little computational overhead. The sourcecode is available at https://github.com/FVL2020/MB-TaylorFormerV2.

Quick Read (beta)

loading the full paper ...