Women, Infamous, and Exotic Beings: What Honorific Usages in Wikipedia Reflect on the Cross-Cultural Sociolinguistic Norms?

Abstract

Wikipedia, as a massively multilingual, community-driven platform, is avaluable resource for Natural Language Processing (NLP), yet the consistency ofhonorific usage in honorific-rich languages remains underexplored. Honorifics,subtle yet profound linguistic markers, encode social hierarchies, politenessnorms, and cultural values, but Wikipedia's editorial guidelines lack clearstandards for their usage in languages where such forms are grammatically andsocially prevalent. This paper addresses this gap through a large-scaleanalysis of third-person honorific pronouns and verb forms in Hindi and BengaliWikipedia articles. Using Large Language Models (LLM), we automaticallyannotate 10,000 articles per language for honorific usage and socio-demographicfeatures such as gender, age, fame, and cultural origin. We investigate: (i)the consistency of honorific usage across articles, (ii) how inconsistenciescorrelate with socio-cultural factors, and (iii) the presence of explicit orimplicit biases across languages. We find that honorific usage is consistentlymore common in Bengali than Hindi, while non-honorific forms are more frequentfor infamous, juvenile, and exotic entities in both. Notably, gender biasemerges in both languages, particularly in Hindi, where men are more likely toreceive honorifics than women. Our analysis highlights the need for Wikipediato develop language-specific editorial guidelines for honorific usage.

Quick Read (beta)

loading the full paper ...