Abstract
While reinforcement learning has shown experimental success in a number ofapplications, it is known to be sensitive to noise and perturbations in theparameters of the system, leading to high variance in the total reward amongstdifferent episodes in slightly different environments. To introduce robustness,as well as sample efficiency, risk-sensitive reinforcement learning methods arebeing thoroughly studied. In this work, we provide a definition of robustreinforcement learning policies and formulate a risk-sensitive reinforcementlearning problem to approximate them, by solving an optimization problem withrespect to a modified objective based on exponential criteria. In particular,we study a model-free risk-sensitive variation of the widely-used Monte CarloPolicy Gradient algorithm and introduce a novel risk-sensitive onlineActor-Critic algorithm based on solving a multiplicative Bellman equation usingstochastic approximation updates. Analytical results suggest that the use ofexponential criteria generalizes commonly used ad-hoc regularizationapproaches, improves sample efficiency, and introduces robustness with respectto perturbations in the model parameters and the environment. Theimplementation, performance, and robustness properties of the proposed methodsare evaluated in simulated experiments.