Abstract
Risk-averse reinforcement learning finds application in various high-stakesfields. Unlike classical reinforcement learning, which aims to maximizeexpected returns, risk-averse agents choose policies that minimize risk,occasionally sacrificing expected value. These preferences can be framedthrough utility theory. We focus on the specific case of the exponentialutility function, where one can derive the Bellman equations and employ variousreinforcement learning algorithms with few modifications. To address this, weintroduce to the broad machine learning community a numerically stable andmathematically sound loss function based on the Itakura-Saito divergence forlearning state-value and action-value functions. We evaluate the Itakura-Saitoloss function against established alternatives, both theoretically andempirically. In the experimental section, we explore multiple scenarios, somewith known analytical solutions, and show that the considered loss functionoutperforms the alternatives.