Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Abstract

Drawing an inspiration from behavioral studies of human decision making, wepropose here a general parametric framework for a reinforcement learningproblem, which extends the standard Q-learning approach to incorporate atwo-stream framework of reward processing with biases biologically associatedwith several neurological and psychiatric conditions, including Parkinson's andAlzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD),addiction, and chronic pain. For AI community, the development of agents thatreact differently to different types of rewards can enable us to understand awide spectrum of multi-agent interactions in complex real-world socioeconomicsystems. Moreover, from the behavioral modeling perspective, our parametricframework can be viewed as a first step towards a unifying computational modelcapturing reward processing abnormalities across multiple mental conditions anduser preferences in long-term recommendation systems.

Quick Read (beta)

loading the full paper ...