Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels

Abstract

Discriminatory social biases, including gender biases, have been found inPre-trained Language Models (PLMs). In Natural Language Inference (NLI), recentbias evaluation methods have observed biased inferences from the outputs of aparticular label such as neutral or entailment. However, since different biasedinferences can be associated with different output labels, it is inaccurate fora method to rely on one label. In this work, we propose an evaluation methodthat considers all labels in the NLI task. We create evaluation data and assignthem into groups based on their expected biased output labels. Then, we definea bias measure based on the corresponding label output of each data group. Inthe experiment, we propose a meta-evaluation method for NLI bias measures, andthen use it to confirm that our measure can evaluate bias more accurately thanthe baseline. Moreover, we show that our evaluation method is applicable tomultiple languages by conducting the meta-evaluation on PLMs in three differentlanguages: English, Japanese, and Chinese. Finally, we evaluate PLMs of eachlanguage to confirm their bias tendency. To our knowledge, we are the first tobuild evaluation datasets and measure the bias of PLMs from the NLI task inJapanese and Chinese.

Quick Read (beta)

loading the full paper ...