Abstract
A versatile medical image segmentation model applicable to imaging datacollected with diverse equipment and protocols can facilitate model deploymentand maintenance. However, building such a model typically requires a large,diverse, and fully annotated dataset, which is rarely available due to thelabor-intensive and costly data curation. In this study, we develop acost-efficient method by harnessing readily available data with partially oreven sparsely annotated segmentation labels. We devise strategies for modelself-disambiguation, prior knowledge incorporation, and imbalance mitigation toaddress challenges associated with inconsistently labeled data from varioussources, including label ambiguity and imbalances across modalities, datasets,and segmentation labels. Experimental results on a multi-modal dataset compiledfrom eight different sources for abdominal organ segmentation have demonstratedour method's effectiveness and superior performance over alternativestate-of-the-art methods, highlighting its potential for optimizing the use ofexisting annotated data and reducing the annotation efforts for new data tofurther enhance model capability.