What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning

Abstract

In sequential decision-making problems involving sensitive attributes likerace and gender, reinforcement learning (RL) agents must carefully considerlong-term fairness while maximizing returns. Recent works have proposed manydifferent types of fairness notions, but how unfairness arises in RL problemsremains unclear. In this paper, we address this gap in the literature byinvestigating the sources of inequality through a causal lens. We first analysethe causal relationships governing the data generation process and decomposethe effect of sensitive attributes on long-term well-being into distinctcomponents. We then introduce a novel notion called dynamics fairness, whichexplicitly captures the inequality stemming from environmental dynamics,distinguishing it from those induced by decision-making or inherited from thepast. This notion requires evaluating the expected changes in the next stateand the reward induced by changing the value of the sensitive attribute whileholding everything else constant. To quantitatively evaluate thiscounterfactual concept, we derive identification formulas that allow us toobtain reliable estimations from data. Extensive experiments demonstrate theeffectiveness of the proposed techniques in explaining, detecting, and reducinginequality in reinforcement learning.

Quick Read (beta)

loading the full paper ...