Variable-Agnostic Causal Exploration for Reinforcement Learning

Abstract

Modern reinforcement learning (RL) struggles to capture real-worldcause-and-effect dynamics, leading to inefficient exploration due to extensivetrial-and-error actions. While recent efforts to improve agent exploration haveleveraged causal discovery, they often make unrealistic assumptions of causalvariables in the environments. In this paper, we introduce a novel framework,Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL),incorporating causal relationships to drive exploration in RL withoutspecifying environmental causal variables. Our approach automaticallyidentifies crucial observation-action steps associated with key variables usingattention mechanisms. Subsequently, it constructs the causal graph connectingthese steps, which guides the agent towards observation-action pairs withgreater causal influence on task completion. This can be leveraged to generateintrinsic rewards or establish a hierarchy of subgoals to enhance explorationefficiency. Experimental results showcase a significant improvement in agentperformance in grid-world, 2d games and robotic domains, particularly inscenarios with sparse rewards and noisy actions, such as the notorious Noisy-TVenvironments.

Quick Read (beta)

loading the full paper ...