Causal Information Prioritization for Efficient Reinforcement Learning

Abstract

Current Reinforcement Learning (RL) methods often suffer fromsample-inefficiency, resulting from blind exploration strategies that neglectcausal relationships among states, actions, and rewards. Although recent causalapproaches aim to address this problem, they lack grounded modeling ofreward-guided causal understanding of states and actions for goal-orientation,thus impairing learning efficiency. To tackle this issue, we propose a novelmethod named Causal Information Prioritization (CIP) that improves sampleefficiency by leveraging factored MDPs to infer causal relationships betweendifferent dimensions of states and actions with respect to rewards, enablingthe prioritization of causal information. Specifically, CIP identifies andleverages causal relationships between states and rewards to executecounterfactual data augmentation to prioritize high-impact state features underthe causal understanding of the environments. Moreover, CIP integrates acausality-aware empowerment learning objective, which significantly enhancesthe agent's execution of reward-guided actions for more efficient explorationin complex environments. To fully assess the effectiveness of CIP, we conductextensive experiments across 39 tasks in 5 diverse continuous controlenvironments, encompassing both locomotion and manipulation skills learningwith pixel-based and sparse reward settings. Experimental results demonstratethat CIP consistently outperforms existing RL methods across a wide range ofscenarios.

Quick Read (beta)

loading the full paper ...