Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis

  • 2025-04-14 18:39:18
  • Ramin Mousa, Hadis Taherinia, Khabiba Abdiyeva, Amir Ali Bengari, Mohammadmahdi Vahediahmar
  • 0

Abstract

Effective recognition of acute and difficult-to-heal wounds is a necessarystep in wound diagnosis. An efficient classification model can help woundspecialists classify wound types with less financial and time costs and alsohelp in deciding on the optimal treatment method. Traditional machine learningmodels suffer from feature selection and are usually cumbersome models foraccurate recognition. Recently, deep learning (DL) has emerged as a powerfultool in wound diagnosis. Although DL seems promising for wound typerecognition, there is still a large scope for improving the efficiency andaccuracy of the model. In this study, a DL-based multimodal classifier wasdeveloped using wound images and their corresponding locations to classify theminto multiple classes, including diabetic, pressure, surgical, and venousulcers. A body map was also created to provide location data, which can helpwound specialists label wound locations more effectively. The model uses aVision Transformer to extract hierarchical features from input images, aDiscrete Wavelet Transform (DWT) layer to capture low and high frequencycomponents, and a Transformer to extract spatial features. The number ofneurons and weight vector optimization were performed using three swarm-basedoptimization techniques (Monster Gorilla Toner (MGTO), Improved Gray WolfOptimization (IGWO), and Fox Optimization Algorithm). The evaluation resultsshow that weight vector optimization using optimization algorithms can increasediagnostic accuracy and make it a very effective approach for wound detection.In the classification using the original body map, the proposed model was ableto achieve an accuracy of 0.8123 using image data and an accuracy of 0.8007using a combination of image data and wound location. Also, the accuracy of themodel in combination with the optimization models varied from 0.7801 to 0.8342.

 

Quick Read (beta)

loading the full paper ...