Interpretable Control by Reinforcement Learning

Abstract

In this paper, three recently introduced reinforcement learning (RL) methodsare used to generate human-interpretable policies for the cart-pole balancingbenchmark. The novel RL methods learn human-interpretable policies in the formof compact fuzzy controllers and simple algebraic equations. Therepresentations as well as the achieved control performances are compared withtwo classical controller design methods and three non-interpretable RL methods.All eight methods utilize the same previously generated data batch and producetheir controller offline - without interaction with the real benchmarkdynamics. The experiments show that the novel RL methods are able toautomatically generate well-performing policies which are at the same timehuman-interpretable. Furthermore, one of the methods is applied toautomatically learn an equation-based policy for a hardware cart-poledemonstrator by using only human-player-generated batch data. The solutiongenerated in the first attempt already represents a successful balancingpolicy, which demonstrates the methods applicability to real-world problems.

Quick Read (beta)

loading the full paper ...