Real-Time Recurrent Reinforcement Learning

Abstract

Recent advances in reinforcement learning, for partially-observable Markovdecision processes (POMDPs), rely on the biologically implausiblebackpropagation through time algorithm (BPTT) to perform gradient-descentoptimisation. In this paper we propose a novel reinforcement learning algorithmthat makes use of random feedback local online learning (RFLO), a biologicallyplausible approximation of realtime recurrent learning (RTRL) to compute thegradients of the parameters of a recurrent neural network in an online manner.By combining it with TD($\lambda$), a variant of temporaldifferencereinforcement learning with eligibility traces, we create a biologicallyplausible, recurrent actor-critic algorithm, capable of solving discrete andcontinuous control tasks in POMDPs. We compare BPTT, RTRL and RFLO as well asdifferent network architectures, and find that RFLO can perform just as well asRTRL while exceeding even BPTT in terms of complexity. The proposed method,called real-time recurrent reinforcement learning (RTRRL), serves as a model oflearning in biological neural networks mimicking reward pathways in themammalian brain.

Quick Read (beta)

loading the full paper ...