BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

Abstract

Despite large language models (LLMs) being known to exhibit bias againstnon-standard language varieties, there are no known labelled datasets forsentiment analysis of English. To address this gap, we introduce BESSTIE, abenchmark for sentiment and sarcasm classification for three varieties ofEnglish: Australian (en-AU), Indian (en-IN), and British (en-UK). We collectdatasets for these language varieties using two methods: location-based forGoogle Places reviews, and topic-based filtering for Reddit comments. To assesswhether the dataset accurately represents these varieties, we conduct twovalidation steps: (a) manual annotation of language varieties and (b) automaticlanguage variety prediction. Native speakers of the language varieties manuallyannotate the datasets with sentiment and sarcasm labels. We perform anadditional annotation exercise to validate the reliance of the annotatedlabels. Subsequently, we fine-tune nine LLMs (representing a range ofencoder/decoder and mono/multilingual models) on these datasets, and evaluatetheir performance on the two tasks. Our results show that the modelsconsistently perform better on inner-circle varieties (i.e., en-AU and en-UK),in comparison with en-IN, particularly for sarcasm classification. We alsoreport challenges in cross-variety generalisation, highlighting the need forlanguage variety-specific datasets such as ours. BESSTIE promises to be auseful evaluative benchmark for future research in equitable LLMs, specificallyin terms of language varieties. The BESSTIE dataset is publicly available at:https://huggingface.co/ datasets/unswnlporg/BESSTIE.

Quick Read (beta)

loading the full paper ...