EHSAN: Leveraging ChatGPT in a Hybrid Framework for Arabic Aspect-Based Sentiment Analysis in Healthcare

Abstract

Arabic-language patient feedback remains under-analysed because dialectdiversity and scarce aspect-level sentiment labels hinder automated assessment.To address this gap, we introduce EHSAN, a data-centric hybrid pipeline thatmerges ChatGPT pseudo-labelling with targeted human review to build the firstexplainable Arabic aspect-based sentiment dataset for healthcare. Each sentenceis annotated with an aspect and sentiment label (positive, negative, orneutral), forming a pioneering Arabic dataset aligned with healthcare themes,with ChatGPT-generated rationales provided for each label to enhancetransparency. To evaluate the impact of annotation quality on modelperformance, we created three versions of the training data: a fully supervisedset with all labels reviewed by humans, a semi-supervised set with 50% humanreview, and an unsupervised set with only machine-generated labels. Wefine-tuned two transformer models on these datasets for both aspect andsentiment classification. Experimental results show that our Arabic-specificmodel achieved high accuracy even with minimal human supervision, reflectingonly a minor performance drop when using ChatGPT-only labels. Reducing thenumber of aspect classes notably improved classification metrics across theboard. These findings demonstrate an effective, scalable approach to Arabicaspect-based sentiment analysis (SA) in healthcare, combining large languagemodel annotation with human expertise to produce a robust and explainabledataset. Future directions include generalisation across hospitals, promptrefinement, and interpretable data-driven modelling.

Quick Read (beta)

loading the full paper ...