Abstract
Reinforcement learning (RL) suffers from severe sample inefficiency,especially during early training, requiring extensive environmentalinteractions to perform competently. Existing methods tend to solve this byincorporating prior knowledge, but introduce significant architectural andimplementation complexity. We propose Dynamic Action Interpolation (DAI), auniversal yet straightforward framework that interpolates expert and RL actionsvia a time-varying weight $\alpha(t)$, integrating into any Actor-Criticalgorithm with just a few lines of code and without auxiliary networks oradditional losses. Our theoretical analysis shows that DAI reshapes statevisitation distributions to accelerate value function learning while preservingconvergence guarantees. Empirical evaluations across MuJoCo continuous controltasks demonstrate that DAI improves early-stage performance by over 160\% onaverage and final performance by more than 50\%, with the Humanoid task showinga 4$\times$ improvement early on and a 2$\times$ gain at convergence. Theseresults challenge the assumption that complex architectural modifications arenecessary for sample-efficient reinforcement learning.