Identifying and Correcting Label Bias in Machine Learning

Abstract

Datasets often contain biases which unfairly disadvantage certain groups, andclassifiers trained on such datasets can inherit these biases. In this paper,we provide a mathematical formulation of how this bias can arise. We do so byassuming the existence of underlying, unknown, and unbiased labels which areoverwritten by an agent who intends to provide accurate labels but may havebiases against certain groups. Despite the fact that we only observe the biasedlabels, we are able to show that the bias may nevertheless be corrected byre-weighting the data points without changing the labels. We show, withtheoretical guarantees, that training on the re-weighted dataset corresponds totraining on the unobserved but unbiased labels, thus leading to an unbiasedmachine learning classifier. Our procedure is fast and robust and can be usedwith virtually any learning algorithm. We evaluate on a number of standardmachine learning fairness datasets and a variety of fairness notions, findingthat our method outperforms standard approaches in achieving fairclassification.

Quick Read (beta)

loading the full paper ...