Abstract
In many real-world applications of reinforcement learning (RL), deployedpolicies have varied impacts on different stakeholders, creating challenges inreaching consensus on how to effectively aggregate their preferences.Generalized $p$-means form a widely used class of social welfare functions forthis purpose, with broad applications in fair resource allocation, AIalignment, and decision-making. This class includes well-known welfarefunctions such as Egalitarian, Nash, and Utilitarian welfare. However,selecting the appropriate social welfare function is challenging fordecision-makers, as the structure and outcomes of optimal policies can behighly sensitive to the choice of $p$. To address this challenge, we study theconcept of an $\alpha$-approximate portfolio in RL, a set of policies that areapproximately optimal across the family of generalized $p$-means for all $p \in[-\infty, 1]$. We propose algorithms to compute such portfolios and providetheoretical guarantees on the trade-offs among approximation factor, portfoliosize, and computational efficiency. Experimental results on synthetic andreal-world datasets demonstrate the effectiveness of our approach insummarizing the policy space induced by varying $p$ values, empoweringdecision-makers to navigate this landscape more effectively.