Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

Abstract

In this work, we present an alternative approach to making an agentcompositional through the use of a diagnostic classifier. Because of the needfor explainable agents in automated decision processes, we attempt to interpretthe latent space from an RL agent to identify its current objective in acomplex language instruction. Results show that the classification processcauses changes in the hidden states which makes them more easily interpretable,but also causes a shift in zero-shot performance to novel instructions. Lastly,we limit the supervisory signal on the classification, and observe a similarbut less notable effect.

Quick Read (beta)

loading the full paper ...