Abstract
Transparency and explainability are two extremely important aspects to beconsidered when employing black-box machine learning models in high-stakeapplications. Providing counterfactual explanations is one way of fulfillingthis requirement. However, this also poses a threat to the privacy of both theinstitution that is providing the explanation as well as the user who isrequesting it. In this work, we propose multiple schemes inspired by privateinformation retrieval (PIR) techniques which ensure the \emph{user's privacy}when retrieving counterfactual explanations. We present a scheme whichretrieves the \emph{exact} nearest neighbor counterfactual explanation from adatabase of accepted points while achieving perfect (information-theoretic)privacy for the user. While the scheme achieves perfect privacy for the user,some leakage on the database is inevitable which we quantify using a mutualinformation based metric. Furthermore, we propose strategies to reduce thisleakage to achieve an advanced degree of database privacy. We extend theseschemes to incorporate user's preference on transforming their attributes, sothat a more actionable explanation can be received. Since our schemes rely onfinite field arithmetic, we empirically validate our schemes on real datasetsto understand the trade-off between the accuracy and the finite field sizes.Finally, we present numerical results to support our theoretical findings, andcompare the database leakage of the proposed schemes.