Abstract
Millions of melanocytic skin lesions are examined by pathologists each year,the majority of which concern common nevi (i.e., ordinary moles). While most ofthese lesions can be diagnosed in seconds, writing the corresponding pathologyreport is much more time-consuming. Automating part of the report writingcould, therefore, alleviate the increasing workload of pathologists. In thiswork, we develop a vision-language model specifically for the pathology domainof cutaneous melanocytic lesions. The model follows the Contrastive Captionerframework and was trained and evaluated using a melanocytic lesion dataset of42,512 H&E-stained whole slide images and 19,645 corresponding pathologyreports. Our results show that the quality scores of model-generated reportswere on par with pathologist-written reports for common nevi, assessed by anexpert pathologist in a reader study. While report generation revealed to bemore difficult for rare melanocytic lesion subtypes, the cross-modal retrievalperformance for these cases was considerably better.