Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success insequential decision-making problems. However, existing DRL agents makedecisions in an opaque fashion, hindering the user from establishing trust andscrutinizing weaknesses of the agents. While recent research has developedInterpretable Policy Extraction (IPE) methods for explaining how an agent takesactions, their explanations are often inconsistent with the agent's behaviorand thus, frequently fail to explain. To tackle this issue, we propose a novelmethod, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start byanalyzing the optimization mechanism of existing IPE methods, elaborating onthe issue of ignoring consistency while increasing cumulative rewards. We thendesign a fidelity-induced mechanism by integrate a fidelity measurement intothe reinforcement learning feedback. We conduct experiments in the complexcontrol environment of StarCraft II, an arena typically avoided by current IPEmethods. The experiment results demonstrate that FIPE outperforms the baselinesin terms of interaction performance and consistency, meanwhile easy tounderstand.

Quick Read (beta)

loading the full paper ...