On Improving Deep Reinforcement Learning for POMDPs

Abstract

Deep Reinforcement Learning (RL) recently emerged as one of the mostcompetitive approaches for learning in sequential decision making problems withfully observable environments, e.g., computer Go. However, very little work hasbeen done in deep RL to handle partially observable environments. We propose anew architecture called Action-specific Deep Recurrent Q-Network (ADRQN) toenhance learning performance in partially observable domains. Actions areencoded by a fully connected layer and coupled with a convolutional observationto form an action-observation pair. The time series of action-observation pairsare then integrated by an LSTM layer that learns latent states based on which afully connected layer computes Q-values as in conventional Deep Q-Networks(DQNs). We demonstrate the effectiveness of our new architecture in severalpartially observable domains, including flickering Atari games.

Quick Read (beta)

loading the full paper ...