Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings

Abstract

Reinforcement learning (RL) in sparse-reward environments remains asignificant challenge due to the lack of informative feedback. We propose asimple yet effective method that uses a small number of successfuldemonstrations to initialize the value function of an RL agent. By precomputingvalue estimates from offline demonstrations and using them as targets for earlylearning, our approach provides the agent with a useful prior over promisingactions. The agent then refines these estimates through standard onlineinteraction. This hybrid offline-to-online paradigm significantly reduces theexploration burden and improves sample efficiency in sparse-reward settings.Experiments on benchmark tasks demonstrate that our method acceleratesconvergence and outperforms standard baselines, even with minimal or suboptimaldemonstration data.

Quick Read (beta)

loading the full paper ...