From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning

Abstract

Reinforcement learning (RL) agents often face challenges in balancingexploration and exploitation, particularly in environments where sparse ordense rewards bias learning. Biological systems, such as human toddlers,naturally navigate this balance by transitioning from free exploration withsparse rewards to goal-directed behavior guided by increasingly dense rewards.Inspired by this natural progression, we investigate the Toddler-InspiredReward Transition in goal-oriented RL tasks. Our study focuses on transitioningfrom sparse to potential-based dense (S2D) rewards while preserving optimalstrategies. Through experiments on dynamic robotic arm manipulation andegocentric 3D navigation tasks, we demonstrate that effective S2D rewardtransitions significantly enhance learning performance and sample efficiency.Additionally, using a Cross-Density Visualizer, we show that S2D transitionssmooth the policy loss landscape, resulting in wider minima that improvegeneralization in RL models. In addition, we reinterpret Tolman's mazeexperiments, underscoring the critical role of early free exploratory learningin the context of S2D rewards.

Quick Read (beta)

loading the full paper ...