Abstract
Many reinforcement learning algorithms are built on an assumption that anagent interacts with an environment over fixed-duration, discrete time steps.However, physical systems are continuous in time, requiring a choice oftime-discretization granularity when digitally controlling them. Furthermore,such systems do not wait for decisions to be made before advancing theenvironment state, necessitating the study of how the choice of discretizationmay affect a reinforcement learning algorithm. In this work, we consider therelationship between the definitions of the continuous-time and discrete-timereturns. Specifically, we acknowledge an idiosyncrasy with naively applying adiscrete-time algorithm to a discretized continuous-time environment, and notehow a simple modification can better align the return definitions. Thisobservation is of practical consideration when dealing with environments wheretime-discretization granularity is a choice, or situations where suchgranularity is inherently stochastic.