Utilizing Explainability Techniques for Reinforcement Learning Model Assurance

Abstract

Explainable Reinforcement Learning (XRL) can provide transparency into thedecision-making process of a Deep Reinforcement Learning (DRL) model andincrease user trust and adoption in real-world use cases. By utilizing XRLtechniques, researchers can identify potential vulnerabilities within a trainedDRL model prior to deployment, therefore limiting the potential for missionfailure or mistakes by the system. This paper introduces the ARLIN (Assured RLModel Interrogation) Toolkit, an open-source Python library that identifiespotential vulnerabilities and critical points within trained DRL models throughdetailed, human-interpretable explainability outputs. To illustrate ARLIN'seffectiveness, we provide explainability visualizations and vulnerabilityanalysis for a publicly available DRL model. The open-source code repository isavailable for download at https://github.com/mitre/arlin.

Quick Read (beta)

loading the full paper ...