Abstract
We consider the problem of learning models for risk-sensitive reinforcementlearning. We theoretically demonstrate that proper value equivalence, a methodof learning models which can be used to plan optimally in the risk-neutralsetting, is not sufficient to plan optimally in the risk-sensitive setting. Weleverage distributional reinforcement learning to introduce two new notions ofmodel equivalence, one which is general and can be used to plan for any riskmeasure, but is intractable; and a practical variation which allows one tochoose which risk measures they may plan optimally for. We demonstrate how ourframework can be used to augment any model-free risk-sensitive algorithm, andprovide both tabular and large-scale experiments to demonstrate its ability.