Abstract
Although multi-modality medical image segmentation holds significantpotential for enhancing the diagnosis and understanding of complex diseases byintegrating diverse imaging modalities, existing methods predominantly rely onfeature-level fusion strategies. We argue the current feature-level fusionstrategy is prone to semantic inconsistencies and misalignments across variousimaging modalities because it merges features at intermediate layers in aneural network without evaluative control. To mitigate this, we introduce anovel image-level fusion based multi-modality medical image segmentationmethod, Fuse4Seg, which is a bi-level learning framework designed to model theintertwined dependencies between medical image segmentation and medical imagefusion. The image-level fusion process is seamlessly employed to guide andenhance the segmentation results through a layered optimization approach.Besides, the knowledge gained from the segmentation module can effectivelyenhance the fusion module. This ensures that the resultant fused image is acoherent representation that accurately amalgamates information from allmodalities. Moreover, we construct a BraTS-Fuse benchmark based on BraTSdataset, which includes 2040 paired original images, multi-modal fusion images,and ground truth. This benchmark not only serves image-level medicalsegmentation but is also the largest dataset for medical image fusion to date.Extensive experiments on several public datasets and our benchmark demonstratethe superiority of our approach over prior state-of-the-art (SOTA)methodologies.