Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation

Abstract

Convolutional neural network (CNN) and Transformer-based architectures aretwo dominant deep learning models for polyp segmentation. However, CNNs havelimited capability for modeling long-range dependencies, while Transformersincur quadratic computational complexity. Recently, State Space Models such asMamba have been recognized as a promising approach for polyp segmentationbecause they not only model long-range interactions effectively but alsomaintain linear computational complexity. However, Mamba-based architecturesstill struggle to capture topological features (e.g., connected components,loops, voids), leading to inaccurate boundary delineation and polypsegmentation. To address these limitations, we propose a new approach calledTopo-VM-UNetV2, which encodes topological features into the Mamba-basedstate-of-the-art polyp segmentation model, VM-UNetV2. Our method consists oftwo stages: Stage 1: VM-UNetV2 is used to generate probability maps (PMs) forthe training and test images, which are then used to compute topology attentionmaps. Specifically, we first compute persistence diagrams of the PMs, then wegenerate persistence score maps by assigning persistence values (i.e., thedifference between death and birth times) of each topological feature to itsbirth location, finally we transform persistence scores into attention weightsusing the sigmoid function. Stage 2: These topology attention maps areintegrated into the semantics and detail infusion (SDI) module of VM-UNetV2 toform a topology-guided semantics and detail infusion (Topo-SDI) module forenhancing the segmentation results. Extensive experiments on five public polypsegmentation datasets demonstrate the effectiveness of our proposed method. Thecode will be made publicly available.

Quick Read (beta)

loading the full paper ...