CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation

Abstract

Convolutional Neural Networks (CNNs) and Transformer-based self-attentionmodels have become the standard for medical image segmentation. This paperdemonstrates that convolution and self-attention, while widely used, are notthe only effective methods for segmentation. Breaking with convention, wepresent a Convolution and self-Attention-free Mamba-based semantic SegmentationNetwork named CAMS-Net. Specifically, we design Mamba-based Channel Aggregatorand Spatial Aggregator, which are applied independently in each encoder-decoderstage. The Channel Aggregator extracts information across different channels,and the Spatial Aggregator learns features across different spatial locations.We also propose a Linearly Interconnected Factorized Mamba (LIFM) block toreduce the computational complexity of a Mamba block and to enhance itsdecision function by introducing a non-linearity between two factorized Mambablocks. Our model outperforms the existing state-of-the-art CNN,self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentationdatasets, showing how this innovative, convolution, and self-attention-freemethod can inspire further research beyond CNN and Transformer paradigms,achieving linear complexity and reducing the number of parameters. Source codeand pre-trained models will be publicly available upon acceptance.

Quick Read (beta)

loading the full paper ...