CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

Abstract

In human decision-making tasks, individuals learn through trials andprediction errors. When individuals learn the task, some are more influenced bygood outcomes, while others weigh bad outcomes more heavily. Such confirmationbias can lead to different learning effects. In this study, we propose a newalgorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea ofdifferent update strategies for positive or negative prediction errors, tosimulate the human decision-making process when the task's states arecontinuous while the actions are discrete. We test in Lunar Lander environmentwith confirmatory, disconfirmatory bias and non-biased to observe the learningeffects. Moreover, we apply the confirmation model in a multi-armed banditproblem (environment in discrete states and discrete actions), which utilizesthe same idea as our proposed algorithm, as a contrast experiment toalgorithmically simulate the impact of different confirmation bias indecision-making process. In both experiments, confirmatory bias indicates abetter learning effect. Our code can be found herehttps://github.com/Patrickhshs/CM-DQN.

Quick Read (beta)

loading the full paper ...