Socially Aware Bias Measurements for Hindi Language Representations

Abstract

Language representations are an efficient tool used across NLP, but they arestrife with encoded societal biases. These biases are studied extensively, butwith a primary focus on English language representations and biases common inthe context of Western society. In this work, we investigate the biases presentin Hindi language representations such as caste and religion associated biases.We demonstrate how biases are unique to specific language representations basedon the history and culture of the region they are widely spoken in, and alsohow the same societal bias (such as binary gender associated biases) wheninvestigated across languages is encoded by different words and text spans.With this work, we emphasize on the necessity of social-awareness along withlinguistic and grammatical artefacts when modeling language representations, inorder to understand the biases encoded.

Quick Read (beta)

loading the full paper ...