Real-Time Recurrent Reinforcement Learning

Abstract

In this paper we propose real-time recurrent reinforcement learning (RTRRL),a biologically plausible approach to solving discrete and continuous controltasks in partially-observable markov decision processes (POMDPs). RTRRLconsists of three parts: (1) a Meta-RL RNN architecture, implementing on itsown an actor-critic algorithm; (2) an outer reinforcement learning algorithm,exploiting temporal difference learning and dutch eligibility traces to trainthe Meta-RL network; and (3) random-feedback local-online (RFLO) learning, anonline automatic differentiation algorithm for computing the gradients withrespect to parameters of the network.Our experimental results show that byreplacing the optimization algorithm in RTRRL with the biologically implausibleback propagation through time (BPTT), or real-time recurrent learning (RTRL),one does not improve returns, while matching the computational complexity forBPTT, and even increasing complexity for RTRL. RTRRL thus serves as a model oflearning in biological neural networks, mimicking reward pathways in the basalganglia.

Quick Read (beta)

loading the full paper ...