Reinforcement Learning from Delayed Observations via World Models

Abstract

In standard reinforcement learning settings, agents typically assumeimmediate feedback about the effects of their actions after taking them.However, in practice, this assumption may not hold true due to physicalconstraints and can significantly impact the performance of learningalgorithms. In this paper, we address observation delays in partiallyobservable environments. We propose leveraging world models, which have shownsuccess in integrating past observations and learning dynamics, to handleobservation delays. By reducing delayed POMDPs to delayed MDPs with worldmodels, our methods can effectively handle partial observability, whereexisting approaches achieve sub-optimal performance or degrade quickly asobservability decreases. Experiments suggest that one of our methods canoutperform a naive model-based approach by up to 250%. Moreover, we evaluateour methods on visual delayed environments, for the first time showcasingdelay-aware reinforcement learning continuous control with visual observations.

Quick Read (beta)

loading the full paper ...