Abstract
Reinforcement learning is challenging in delayed scenarios, a commonreal-world situation where observations and interactions occur with delays.State-of-the-art (SOTA) state-augmentation techniques either suffer from thestate-space explosion along with the delayed steps, or performance degenerationin stochastic environments. To address these challenges, our novelAuxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliaryshort-delayed task to accelerate the learning on a long-delayed task withoutcompromising the performance in stochastic environments. Specifically, AD-RLlearns the value function in the short-delayed task and then employs it withthe bootstrapping and policy improvement techniques in the long-delayed task.We theoretically show that this can greatly reduce the sample complexitycompared to directly learning on the original long-delayed task. Ondeterministic and stochastic benchmarks, our method remarkably outperforms theSOTAs in both sample efficiency and policy performance.