Risk-Sensitive Reinforcement Learning

Abstract

The classic objective in a reinforcement learning (RL) problem is to find apolicy that minimizes, in expectation, a long-run objective such as theinfinite-horizon cumulative discounted or long-run average cost. In manypractical applications, optimizing the expected value alone is not sufficient,and it may be necessary to include a risk measure in the optimization process,either in the objective or as a constraint. Various risk measures have beenproposed in the literature, e.g., variance, exponential utility, percentileperformance, chance constraints, value at risk (quantile), conditionalvalue-at-risk, coherent risk measure, prospect theory and its laterenhancement, cumulative prospect theory. In this article, we focus on thecombination of risk criteria and reinforcement learning in a constrainedoptimization framework, i.e., a setting where the goal to find a policy thatoptimizes the usual objective of infinite-horizon discounted/average cost,while ensuring that an explicit risk constraint is satisfied. We introduce therisk-constrained RL framework, cover popular risk measures based on variance,conditional value-at-risk, and chance constraints, and present a template for arisk-sensitive RL algorithm. Next, we study risk-sensitive RL with theobjective of minimizing risk in an unconstrained framework, and covercumulative prospect theory and coherent risk measures as special cases. Wesurvey some of the recent work on this topic, covering problems encompassingdiscounted cost, average cost, and stochastic shortest path settings, togetherwith the aforementioned risk measures, in constrained as well as unconstrainedframeworks. This non-exhaustive survey is aimed at giving a flavor of thechallenges involved in solving risk-sensitive RL problems, and outlining somepotential future research directions.

Quick Read (beta)

loading the full paper ...