Abstract
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcementlearning using deep neural networks. DQNs require large buffers for experiencereply and rely on backpropagation based iterative optimization, making themdifficult to be implemented on resource-limited edge devices. In this paper, wepropose a lightweight on-device reinforcement learning approach for low-costFPGA devices. It exploits a recently proposed neural-network based on-devicelearning approach that does not rely on the backpropagation method but uses ELM(Extreme Learning Machine) and OS-ELM (Online Sequential ELM) based trainingalgorithms. In addition, we propose a combination of L2 regularization andspectral normalization for the on-device reinforcement learning, so that outputvalues of the neural networks can be fit into a certain range and thereinforcement learning becomes stable. The proposed reinforcement learningapproach is designed for Xilinx PYNQ-Z1 board as a low-cost FPGA platform. Theexperiment results using OpenAI Gym demonstrate that the proposed algorithm andits FPGA implementation complete a CartPole-v0 task 29.76x and 126.06x fasterthan a conventional DQN-based approach when the number of hidden-layer nodes is64.