Abstract
Nearly all state-of-the-art deep learning algorithms rely on errorbackpropagation, which is generally regarded as biologically implausible. Analternative way of training an artificial neural network is through treatingeach unit in the network as a reinforcement learning agent, and thus thenetwork is considered as a team of agents. As such, all units can be trained byREINFORCE, a local learning rule modulated by a global signal that is moreconsistent with biologically observed forms of synaptic plasticity. Althoughthis learning rule follows the gradient of return in expectation, it suffersfrom high variance and thus the low speed of learning, rendering it impracticalto train deep networks. We therefore propose a novel algorithm called MAPpropagation to reduce this variance significantly while retaining the localproperty of the learning rule. Experiments demonstrated that MAP propagationcould solve common reinforcement learning tasks at a similar speed tobackpropagation when applied to an actor-critic network. Our work thus allowsfor the broader application of the teams of agents in deep reinforcementlearning.