Reinforcement Learning for Jump-Diffusions, with Financial Applications

Abstract

We study continuous-time reinforcement learning (RL) for stochastic controlin which system dynamics are governed by jump-diffusion processes. We formulatean entropy-regularized exploratory control problem with stochastic policies tocapture the exploration--exploitation balance essential for RL. Unlike the purediffusion case initially studied by Wang et al. (2020), the derivation of theexploratory dynamics under jump-diffusions calls for a careful formulation ofthe jump part. Through a theoretical analysis, we find that one can simply usethe same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a,2023), originally developed for controlled diffusions, without needing to checka priori whether the underlying data come from a pure diffusion or ajump-diffusion. However, we show that the presence of jumps ought to affectparameterizations of actors and critics in general. We investigate as anapplication the mean--variance portfolio selection problem with stock pricemodelled as a jump-diffusion, and show that both RL algorithms andparameterizations are invariant with respect to jumps. Finally, we present adetailed study on applying the general theory to option hedging.

Quick Read (beta)

loading the full paper ...