!MDP Playground: Meta-Features in Reinforcement Learning

Abstract

Reinforcement Learning (RL) algorithms usually assume their environment to bea Markov Decision Process (MDP). Additionally, they do not try to identifyspecific features of environments which could help them perform better. Here,we present a few key meta-features of environments: delayed rewards, specificreward sequences, sparsity of rewards, and stochasticity of environments, whichmay violate the MDP assumptions and adapting to which should help RL agentsperform better. While it is very time consuming to run RL algorithms onstandard benchmarks, we define a parameterised collection of fast-to-run toybenchmarks in OpenAI Gym by varying these meta-features. Despite their toynature and low compute requirements, we show that these benchmarks presentsubstantial difficulties to current RL algorithms. Furthermore, since we cangenerate environments with a desired value for each of the meta-features, wehave fine-grained control over the environments' difficulty and also have theground truth available for evaluating algorithms. We believe that devisingalgorithms that can detect such meta-features of environments and adapt to themwill be key to creating robust RL algorithms that work in a variety ofdifferent real-world problems.

Quick Read (beta)

loading the full paper ...