Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

Abstract

During sleep and awake rest, the hippocampus replays sequences of place cellsthat have been activated during prior experiences. These have been interpretedas a memory consolidation process, but recent results suggest a possibleinterpretation in terms of reinforcement learning. The Dyna reinforcementlearning algorithms use off-line replays to improve learning. Under limitedreplay budget, a prioritized sweeping approach, which requires a model of thetransitions to the predecessors, can be used to improve performance. Weinvestigate whether such algorithms can explain the experimentally observedreplays. We propose a neural network version of prioritized sweepingQ-learning, for which we developed a growing multiple expert algorithm, able tocope with multiple predecessors. The resulting architecture is able to improvethe learning of simulated agents confronted to a navigation task. We predictthat, in animals, learning the world model should occur during rest periods,and that the corresponding replays should be shuffled.

Quick Read (beta)

loading the full paper ...