Abstract
Segment Anything Model (SAM) demonstrates powerful zero-shot capabilities;however, its accuracy and robustness significantly decrease when applied tomedical image segmentation. Existing methods address this issue throughmodality fusion, integrating textual and image information to provide moredetailed priors. In this study, we argue that the granularity of text and thedomain gap affect the accuracy of the priors. Furthermore, the discrepancybetween high-level abstract semantics and pixel-level boundary details inimages can introduce noise into the fusion process. To address this, we proposePrior-Guided SAM (PG-SAM), which employs a fine-grained modality prior alignerto leverage specialized medical knowledge for better modality alignment. Thecore of our method lies in efficiently addressing the domain gap withfine-grained text from a medical LLM. Meanwhile, it also enhances the priors'quality after modality alignment, ensuring more accurate segmentation. Inaddition, our decoder enhances the model's expressive capabilities throughmulti-level feature fusion and iterative mask optimizer operations, supportingunprompted learning. We also propose a unified pipeline that effectivelysupplies high-quality semantic information to SAM. Extensive experiments on theSynapse dataset demonstrate that the proposed PG-SAM achieves state-of-the-artperformance. Our anonymous code is released athttps://github.com/logan-0623/PG-SAM.