The Value Function Polytope in Reinforcement Learning

  • 2019-01-31 18:45:04
  • Robert Dadashi, Adrien Ali Ta├»ga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
We establish geometric and topological properties of the space of valuefunctions in finite state-action Markov decision processes. Our maincontribution is the characterization of the nature of its shape: a generalpolytope (Aigner et al., 2010). To demonstrate this result, we exhibit severalproperties of the structural relationship between policies and value functionsincluding the line theorem, which shows that the value functions of policiesconstrained on all but one state describe a line segment. Finally, we use thisnovel perspective to introduce visualizations to enhance the understanding ofthe dynamics of reinforcement learning algorithms.


