Interpretable Policies for Reinforcement Learning by Genetic Programming

  • 2017-12-12 08:31:51
  • Daniel Hein, Steffen Udluft, Thomas A. Runkler
  • 1

Abstract

The search for interpretable reinforcement learning policies is of highacademic and industrial interest. Especially for industrial systems, domainexperts are more likely to deploy autonomously learned controllers if they areunderstandable and convenient to evaluate. Basic algebraic equations aresupposed to meet these requirements, as long as they are restricted to anadequate complexity. Here we introduce the genetic programming forreinforcement learning (GPRL) approach based on model-based batch reinforcementlearning and genetic programming, which autonomously learns policy equationsfrom pre-existing default state-action trajectory samples. GPRL is compared toa straight-forward method which utilizes genetic programming for symbolicregression, yielding policies imitating an existing well-performing, butnon-interpretable policy. Experiments on three reinforcement learningbenchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark,demonstrate the superiority of our GPRL approach compared to the symbolicregression method. GPRL is capable of producing well-performing interpretablereinforcement learning policies from pre-existing default trajectory data.

 

Quick Read (beta)

loading the full paper ...