Towards Automated Semantic Interpretability in Reinforcement Learning via Vision-Language Models

Abstract

Semantic Interpretability in Reinforcement Learning (RL) enablestransparency, accountability, and safer deployment by making the agent'sdecisions understandable and verifiable. Achieving this, however, requires afeature space composed of human-understandable concepts, which traditionallyrely on human specification and fail to generalize to unseen environments. Inthis work, we introduce Semantically Interpretable Reinforcement Learning withVision-Language Models Empowered Automation (SILVA), an automated frameworkthat leverages pre-trained vision-language models (VLM) for semantic featureextraction and interpretable tree-based models for policy optimization. SILVAfirst queries a VLM to identify relevant semantic features for an unseenenvironment, then extracts these features from the environment. Finally, ittrains an Interpretable Control Tree via RL, mapping the extracted features toactions in a transparent and interpretable manner. To address the computationalinefficiency of extracting features directly with VLMs, we develop a featureextraction pipeline that generates a dataset for training a lightweightconvolutional network, which is subsequently used during RL. By leveraging VLMsto automate tree-based RL, SILVA removes the reliance on human annotationpreviously required by interpretable models while also overcoming the inabilityof VLMs alone to generate valid robot policies, enabling semanticallyinterpretable reinforcement learning without human-in-the-loop.

Quick Read (beta)

loading the full paper ...