Reinforcement Learning using Guided Observability

Abstract

Due to recent breakthroughs, reinforcement learning (RL) has demonstratedimpressive performance in challenging sequential decision-making problems.However, an open question is how to make RL cope with partial observabilitywhich is prevalent in many real-world problems. Contrary to contemporary RLapproaches, which focus mostly on improved memory representations or strongassumptions about the type of partial observability, we propose a simple butefficient approach that can be applied together with a wide variety of RLmethods. Our main insight is that smoothly transitioning from fullobservability to partial observability during the training process yields ahigh performance policy. The approach, called partially observable guidedreinforcement learning (PO-GRL), allows to utilize full state informationduring policy optimization without compromising the optimality of the finalpolicy. A comprehensive evaluation in discrete partially observableMarkovdecision process (POMDP) benchmark problems and continuous partially observableMuJoCo and OpenAI gym tasks shows that PO-GRL improves performance. Finally, wedemonstrate PO-GRL in the ball-in-the-cup task on a real Barrett WAM robotunder partial observability.

Quick Read (beta)

loading the full paper ...