Abstract
We revisit the relationship between attention mechanisms and large kernelConvNets in visual transformers and propose a new spatial attention named LargeKernel Convolutional Attention (LKCA). It simplifies the attention operation byreplacing it with a single large kernel convolution. LKCA combines theadvantages of convolutional neural networks and visual transformers, possessinga large receptive field, locality, and parameter sharing. We explained thesuperiority of LKCA from both convolution and attention perspectives, providingequivalent code implementations for each view. Experiments confirm that LKCAimplemented from both the convolutional and attention perspectives exhibitequivalent performance. We extensively experimented with the LKCA variant ofViT in both classification and segmentation tasks. The experiments demonstratedthat LKCA exhibits competitive performance in visual tasks. Our code will bemade publicly available at https://github.com/CatworldLee/LKCA.