A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI

  • 2024-11-06 23:41:18
  • Melusi Malinga, Isaac Lupanda, Mike Wa Nkongolo, Phil van Deventer
  • 0

Abstract

South Africa and the Democratic Republic of Congo (DRC) present a complexlinguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French,English, and Tshiluba (Ciluba), which creates unique challenges for AI-driventranslation and sentiment analysis systems due to a lack of accurately labeleddata. This study seeks to address these challenges by developing a multilinguallexicon designed for French and Tshiluba, now expanded to include translationsin English, Afrikaans, Sepedi, and Zulu. The lexicon enhances culturalrelevance in sentiment classification by integrating language-specificsentiment scores. A comprehensive testing corpus is created to supporttranslation and sentiment analysis tasks, with machine learning models such asRandom Forest, Support Vector Machine (SVM), Decision Trees, and Gaussian NaiveBayes (GNB) trained to predict sentiment across low resource languages (LRLs).Among them, the Random Forest model performed particularly well, capturingsentiment polarity and handling language-specific nuances effectively.Furthermore, Bidirectional Encoder Representations from Transformers (BERT), aLarge Language Model (LLM), is applied to predict context-based sentiment withhigh accuracy, achieving 99% accuracy and 98% precision, outperforming othermodels. The BERT predictions were clarified using Explainable AI (XAI),improving transparency and fostering confidence in sentiment classification.Overall, findings demonstrate that the proposed lexicon and machine learningmodels significantly enhance translation and sentiment analysis for LRLs inSouth Africa and the DRC, laying a foundation for future AI models that supportunderrepresented languages, with applications across education, governance, andbusiness in multilingual contexts.

 

Quick Read (beta)

loading the full paper ...