Abstract
Reinforcement learning typically assumes that the state update from theprevious actions happens instantaneously, and thus can be used for makingfuture decisions. However, this may not always be true. When the state updateis not available, the decision taken is partly in the blind since it cannotrely on the current state information. This paper proposes an approach, wherethe delay in the knowledge of the state can be used, and the decisions are madebased on the available information which may not include the current stateinformation. One approach could be to include the actions after the last-knownstate as a part of the state information, however, that leads to an increasedstate-space making the problem complex and slower in convergence. The proposedalgorithm gives an alternate approach where the state space is not enlarged, ascompared to the case when there is no delay in the state update. Evaluations onthe basic RL environments further illustrate the improved performance of theproposed algorithm.