Abstract
Image segmentation holds a vital position in the realms of diagnosis andtreatment within the medical domain. Traditional convolutional neural networks(CNNs) and Transformer models have made significant advancements in this realm,but they still encounter challenges because of limited receptive field or highcomputing complexity. Recently, State Space Models (SSMs), particularly Mambaand its variants, have demonstrated notable performance in the field of vision.However, their feature extraction methods may not be sufficiently effective andretain some redundant structures, leaving room for parameter reduction.Motivated by previous spatial and channel attention methods, we propose TripletMamba-UNet. The method leverages residual VSS Blocks to extract intensivecontextual features, while Triplet SSM is employed to fuse features acrossspatial and channel dimensions. We conducted experiments on ISIC17, ISIC18,CVC-300, CVC-ClinicDB, Kvasir-SEG, CVC-ColonDB, and Kvasir-Instrument datasets,demonstrating the superior segmentation performance of our proposed TM-UNet.Additionally, compared to the previous VM-UNet, our model achieves a one-thirdreduction in parameters.