Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Abstract

Reinforcement learning is hard in general. Yet, in many specificenvironments, learning is easy. What makes learning easy in one environment,but difficult in another? We address this question by proposing a simplemeasure of reinforcement-learning hardness called the bad-policy density. Thisquantity measures the fraction of the deterministic stationary policy spacethat is below a desired threshold in value. We prove that this simple quantityhas many properties one would expect of a measure of learning hardness.Further, we prove it is NP-hard to compute the measure in general, but thereare paths to polynomial-time approximation. We conclude by summarizingpotential directions and uses for this measure.

Quick Read (beta)

loading the full paper ...