Abstract
In this paper we introduce a reinforcement learning (RL) approach fortraining policies, including artificial neural network policies, that is bothbackpropagation-free and clock-free. It is backpropagation-free in that it doesnot propagate any information backwards through the network. It is clock-freein that no signal is given to each node in the network to specify when itshould compute its output and when it should update its weights. We contendthat these two properties increase the biological plausibility of ouralgorithms and facilitate distributed implementations. Additionally, ourapproach eliminates the need for customized learning rules for hierarchical RLalgorithms like the option-critic.
Quick Read (beta)
loading the full paper ...