Abstract
Biomedical image analysis is fundamental for biomedical discovery in cellbiology, pathology, radiology, and many other biomedical domains. Holisticimage analysis comprises interdependent subtasks such as segmentation,detection, and recognition of relevant objects. Here, we propose BiomedParse, abiomedical foundation model for imaging parsing that can jointly conductsegmentation, detection, and recognition for 82 object types across 9 imagingmodalities. Through joint learning, we can improve accuracy for individualtasks and enable novel applications such as segmenting all relevant objects inan image through a text prompt, rather than requiring users to laboriouslyspecify the bounding box for each object. We leveraged readily availablenatural-language labels or descriptions accompanying those datasets and useGPT-4 to harmonize the noisy, unstructured text information with establishedbiomedical object ontologies. We created a large dataset comprising over sixmillion triples of image, segmentation mask, and textual description. On imagesegmentation, we showed that BiomedParse is broadly applicable, outperformingstate-of-the-art methods on 102,855 test image-mask-label triples across 9imaging modalities (everything). On object detection, which aims to locate aspecific object of interest, BiomedParse again attained state-of-the-artperformance, especially on objects with irregular shapes (everywhere). Onobject recognition, which aims to identify all objects in a given image alongwith their semantic types, we showed that BiomedParse can simultaneouslysegment and label all biomedical objects in an image (all at once). In summary,BiomedParse is an all-in-one tool for biomedical image analysis by jointlysolving segmentation, detection, and recognition for all major biomedical imagemodalities, paving the path for efficient and accurate image-based biomedicaldiscovery.