Abstract
Research in multi-objective reinforcement learning (MORL) has introduced theutility-based paradigm, which makes use of both environmental rewards and afunction that defines the utility derived by the user from those rewards. Inthis paper we extend this paradigm to the context of single-objectivereinforcement learning (RL), and outline multiple potential benefits includingthe ability to perform multi-policy learning across tasks relating to uncertainobjectives, risk-aware RL, discounting, and safe RL. We also examine thealgorithmic implications of adopting a utility-based approach.
Quick Read (beta)
loading the full paper ...