Bad-Policy Density: A Measure of Reinforcement Learning Hardness

  • 2021-10-07 13:09:16
  • David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong
  • 3

Abstract

Reinforcement learning is hard in general. Yet, in many specificenvironments, learning is easy. What makes learning easy in one environment,but difficult in another? We address this question by proposing a simplemeasure of reinforcement-learning hardness called the bad-policy density. Thisquantity measures the fraction of the deterministic stationary policy spacethat is below a desired threshold in value. We prove that this simple quantityhas many properties one would expect of a measure of learning hardness.Further, we prove it is NP-hard to compute the measure in general, but thereare paths to polynomial-time approximation. We conclude by summarizingpotential directions and uses for this measure.

 

Quick Read (beta)

loading the full paper ...