Abstract
Reinforcement learning is hard in general. Yet, in many specificenvironments, learning is easy. What makes learning easy in one environment,but difficult in another? We address this question by proposing a simplemeasure of reinforcement-learning hardness called the bad-policy density. Thisquantity measures the fraction of the deterministic stationary policy spacethat is below a desired threshold in value. We prove that this simple quantityhas many properties one would expect of a measure of learning hardness.Further, we prove it is NP-hard to compute the measure in general, but thereare paths to polynomial-time approximation. We conclude by summarizingpotential directions and uses for this measure.
Quick Read (beta)
loading the full paper ...