The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric

Abstract

Where machine-learned predictive risk scores inform high-stakes decisions,such as bail and sentencing in criminal justice, fairness has been a seriousconcern. Recent work has characterized the disparate impact that such riskscores can have when used for a binary classification task and provided toolsto audit and adjust resulting classifiers. This may not account, however, forthe more diverse downstream uses of risk scores and their non-binary nature. Tobetter account for this, in this paper, we investigate the fairness ofpredictive risk scores from the point of view of a bipartite ranking task,where one seeks to rank positive examples higher than negative ones. Weintroduce the xAUC disparity as a metric to assess the disparate impact of riskscores and define it as the difference in the probabilities of ranking a randompositive example from one protected group above a negative one from anothergroup and vice versa. We provide a decomposition of bipartite ranking loss intocomponents that involve the discrepancy and components that involve purepredictive ability within each group. We further provide an interpretation ofthe xAUC discrepancy in terms of resource allocation fairness and makeconnections to existing fairness metrics and adjustments. We assess xAUCempirically on datasets in recidivism prediction, income prediction, andcardiac arrest prediction, where it describes disparities that are not evidentfrom simply comparing within-group predictive performance.

Quick Read (beta)

loading the full paper ...