DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay

Abstract

We present a detailed study of Deep Q-Networks in finite environments,emphasizing the impact of epsilon-greedy exploration schedules and prioritizedexperience replay. Through systematic experimentation, we evaluate howvariations in epsilon decay schedules affect learning efficiency, convergencebehavior, and reward optimization. We investigate how prioritized experiencereplay leads to faster convergence and higher returns and show empiricalresults comparing uniform, no replay, and prioritized strategies acrossmultiple simulations. Our findings illuminate the trade-offs and interactionsbetween exploration strategies and memory management in DQN training, offeringpractical recommendations for robust reinforcement learning inresource-constrained settings.

Quick Read (beta)

loading the full paper ...