Abstract
Traditional Chinese medicine (TCM) tongue diagnosis, while clinicallyvaluable, faces standardization challenges due to subjective interpretation andinconsistent imaging protocols, compounded by the lack of large-scale,annotated datasets for AI development. To address this gap, we present thefirst specialized dataset for AI-driven TCM tongue diagnosis, comprising 6,719high-quality images captured under standardized conditions and annotated with20 pathological symptom categories (averaging 2.54 clinically validated labelsper image, all verified by licensed TCM practitioners). The dataset supportsmultiple annotation formats (COCO, TXT, XML) for broad usability and has beenbenchmarked using nine deep learning models (YOLOv5/v7/v8 variants, SSD, andMobileNetV2) to demonstrate its utility for AI development. This resourceprovides a critical foundation for advancing reliable computational tools inTCM, bridging the data shortage that has hindered progress in the field, andfacilitating the integration of AI into both research and clinical practicethrough standardized, high-quality diagnostic data.