Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy

Abstract

There is a growing desire in the field of reinforcement learning (and machinelearning in general) to move from black-box models toward more "interpretableAI." We improve interpretability of reinforcement learning by increasing theutility of decision tree policies learned via reinforcement learning. Thesepolicies consist of a decision tree over the state space, which requires fewerparameters to express than traditional policy representations. Existing methodsfor creating decision tree policies via reinforcement learning focus onaccurately representing an action-value function during training, but thisleads to much larger trees than would otherwise be required. To address thisshortcoming, we propose a novel algorithm which only increases tree size whenthe estimated discounted future reward of the overall policy would increase bya sufficient amount. Through evaluation in a simulated environment, we showthat its performance is comparable or superior to traditional tree-basedapproaches and that it yields a more succinct policy. Additionally, we discusstuning parameters to control the tradeoff between optimizing for smaller treesize or for overall reward.

Quick Read (beta)

loading the full paper ...