Abstract
We propose selective debiasing -- an inference-time safety mechanism designedto enhance the overall model quality in terms of prediction performance andfairness, especially in scenarios where retraining the model is impractical.The method draws inspiration from selective classification, where at inferencetime, predictions with low quality, as indicated by their uncertainty scores,are discarded. In our approach, we identify the potentially biased modelpredictions and, instead of discarding them, we remove bias from thesepredictions using LEACE -- a post-processing debiasing method. To selectproblematic predictions, we propose a bias quantification approach based on KLdivergence, which achieves better results than standard uncertaintyquantification methods. Experiments on text classification datasets withencoder-based classification models demonstrate that selective debiasing helpsto reduce the performance gap between post-processing methods and debiasingtechniques from the at-training and pre-processing categories.