Abstract
Large language models (LLMs) often inherit and amplify social biases embeddedin their training data. A prominent social bias is gender bias. In this regard,prior work has mainly focused on gender stereotyping bias - the association ofspecific roles or traits with a particular gender - in English and onevaluating gender bias in model embeddings or generated outputs. In contrast,gender representation bias - the unequal frequency of references to individualsof different genders - in the training corpora has received less attention. Yetsuch imbalances in the training data constitute an upstream source of bias thatcan propagate and intensify throughout the entire model lifecycle. To fill thisgap, we propose a novel LLM-based method to detect and quantify genderrepresentation bias in LLM training data in gendered languages, wheregrammatical gender challenges the applicability of methods developed forEnglish. By leveraging the LLMs' contextual understanding, our approachautomatically identifies and classifies person-referencing words in genderedlanguage corpora. Applied to four Spanish-English benchmarks and five Valenciancorpora, our method reveals substantial male-dominant imbalances. We show thatsuch biases in training data affect model outputs, but can surprisingly bemitigated leveraging small-scale training on datasets that are biased towardsthe opposite gender. Our findings highlight the need for corpus-level genderbias analysis in multilingual NLP. We make our code and data publiclyavailable.