Frequency Dynamic Convolution for Dense Image Prediction

Abstract

While Dynamic Convolution (DY-Conv) has shown promising performance byenabling adaptive weight selection through multiple parallel weights combinedwith an attention mechanism, the frequency response of these weights tends toexhibit high similarity, resulting in high parameter costs but limitedadaptability. In this work, we introduce Frequency Dynamic Convolution(FDConv), a novel approach that mitigates these limitations by learning a fixedparameter budget in the Fourier domain. FDConv divides this budget intofrequency-based groups with disjoint Fourier indices, enabling the constructionof frequency-diverse weights without increasing the parameter cost. To furtherenhance adaptability, we propose Kernel Spatial Modulation (KSM) and FrequencyBand Modulation (FBM). KSM dynamically adjusts the frequency response of eachfilter at the spatial level, while FBM decomposes weights into distinctfrequency bands in the frequency domain and modulates them dynamically based onlocal content. Extensive experiments on object detection, segmentation, andclassification validate the effectiveness of FDConv. We demonstrate that whenapplied to ResNet-50, FDConv achieves superior performance with a modestincrease of +3.6M parameters, outperforming previous methods that requiresubstantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M).Moreover, FDConv seamlessly integrates into a variety of architectures,including ConvNeXt, Swin-Transformer, offering a flexible and efficientsolution for modern vision tasks. The code is made publicly available athttps://github.com/Linwei-Chen/FDConv.

Quick Read (beta)

loading the full paper ...