Abstract
During sleep and awake rest, the hippocampus replays sequences of place cellsthat have been activated during prior experiences. These have been interpretedas a memory consolidation process, but recent results suggest a possibleinterpretation in terms of reinforcement learning. The Dyna reinforcementlearning algorithms use off-line replays to improve learning. Under limitedreplay budget, a prioritized sweeping approach, which requires a model of thetransitions to the predecessors, can be used to improve performance. Weinvestigate whether such algorithms can explain the experimentally observedreplays. We propose a neural network version of prioritized sweepingQ-learning, for which we developed a growing multiple expert algorithm, able tocope with multiple predecessors. The resulting architecture is able to improvethe learning of simulated agents confronted to a navigation task. We predictthat, in animals, learning the world model should occur during rest periods,and that the corresponding replays should be shuffled.