BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination

Abstract

Deep reinforcement Learning (DRL) offers a powerful framework for training AIagents to coordinate with human partners. However, DRL faces two criticalchallenges in human-AI coordination (HAIC): sparse rewards and unpredictablehuman behaviors. These challenges significantly limit DRL to identify effectivecoordination policies, due to its impaired capability of optimizing explorationand exploitation. To address these limitations, we propose an innovativebehavior- and context-aware reward (BCR) for DRL, which optimizes explorationand exploitation by leveraging human behaviors and contextual information inHAIC. Our BCR consists of two components: (i) A novel dual intrinsic rewardingscheme to enhance exploration. This scheme composes an AI self-motivatedintrinsic reward and a human-motivated intrinsic reward, which are designed toincrease the capture of sparse rewards by a logarithmic-based strategy; and(ii) A new context-aware weighting mechanism for the designed rewards toimprove exploitation. This mechanism helps the AI agent prioritize actions thatbetter coordinate with the human partner by utilizing contextual informationthat can reflect the evolution of learning. Extensive simulations in theOvercooked environment demonstrate that our approach can increase thecumulative sparse rewards by approximately 20%, and improve the sampleefficiency by around 38% compared to state-of-the-art baselines.

Quick Read (beta)

loading the full paper ...