$β$-DQN: Improving Deep Q-Learning By Evolving the Behavior

  • 2025-10-28 15:26:34
  • Hongming Zhang, Fengshuo Bai, Chenjun Xiao, Chao Gao, Bo Xu, Martin Müller
  • 0

Abstract

While many sophisticated exploration methods have been proposed, their lackof generality and high computational cost often lead researchers to favorsimpler methods like $\epsilon$-greedy. Motivated by this, we introduce$\beta$-DQN, a simple and efficient exploration method that augments thestandard DQN with a behavior function $\beta$. This function estimates theprobability that each action has been taken at each state. By leveraging$\beta$, we generate a population of diverse policies that balance explorationbetween state-action coverage and overestimation bias correction. An adaptivemeta-controller is designed to select an effective policy for each episode,enabling flexible and explainable exploration. $\beta$-DQN is straightforwardto implement and adds minimal computational overhead to the standard DQN.Experiments on both simple and challenging exploration domains show that$\beta$-DQN outperforms existing baseline methods across a wide range of tasks,providing an effective solution for improving exploration in deep reinforcementlearning.

 

Quick Read (beta)

loading the full paper ...