Abstract
Purpose: This study focuses on the development of automated text generationfrom radiology images, termed diagnostic captioning, to assist medicalprofessionals in reducing clinical errors and improving productivity. The aimis to provide tools that enhance report quality and efficiency, which cansignificantly impact both clinical practice and deep learning research in thebiomedical field. Methods: In our participation in the ImageCLEFmedical2024Caption evaluation campaign, we explored caption prediction tasks usingadvanced Transformer-based models. We developed methods incorporatingTransformer encoder-decoder and Query Transformer architectures. These modelswere trained and evaluated to generate diagnostic captions from radiologyimages. Results: Experimental evaluations demonstrated the effectiveness of ourmodels, with the VisionDiagnostor-BioBART model achieving the highest BERTScoreof 0.6267. This performance contributed to our team, DarkCow, achieving thirdplace on the leaderboard. Conclusion: Our diagnostic captioning models showgreat promise in aiding medical professionals by generating high-qualityreports efficiently. This approach can facilitate better data processing andperformance optimization in medical imaging departments, ultimately benefitinghealthcare delivery.