Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning

Abstract

Human-AI collaborative policy synthesis is a procedure in which (1) a humaninitializes an autonomous agent's behavior, (2) Reinforcement Learning improvesthe human specified behavior, and (3) the agent can explain the final optimizedpolicy to the user. This paradigm leverages human expertise and facilitates agreater insight into the learned behaviors of an agent. Existing approaches toenabling collaborative policy specification involve black box methods which areunintelligible and are not catered towards non-expert end-users. In this paper,we develop a novel collaborative framework to enable humans to initialize andinterpret an autonomous agent's behavior, rooted in principles ofhuman-centered design. Through our framework, we enable humans to specify aninitial behavior model in the form of unstructured, natural language, which wethen convert to lexical decision trees. Next, we are able to leverage thesehuman-specified policies, to warm-start reinforcement learning and furtherallow the agent to optimize the policies through reinforcement learning.Finally, to close the loop on human-specification, we produce explanations ofthe final learned policy, in multiple modalities, to provide the user a finaldepiction about the learned policy of the agent. We validate our approach byshowing that our model can produce >80% accuracy, and that human-initializedpolicies are able to successfully warm-start RL. We then conduct a novelhuman-subjects study quantifying the relative subjective and objective benefitsof varying XAI modalities(e.g., Tree, Language, and Program) for explaininglearned policies to end-users, in terms of usability and interpretability andidentify the circumstances that influence these measures. Our findingsemphasize the need for personalized explainable systems that can facilitateuser-centric policy explanations for a variety of end-users.

Quick Read (beta)

loading the full paper ...