DQN (Deep Q-Network) is a method to perform Q-learning for reinforcementlearning using deep neural networks. DQNs require a large buffer and batchprocessing for an experience replay and rely on a backpropagation basediterative optimization, making them difficult to be implemented onresource-limited edge devices. In this paper, we propose a lightweighton-device reinforcement learning approach for low-cost FPGA devices. Itexploits a recently proposed neural-network based on-device learning approachthat does not rely on the backpropagation method but uses OS-ELM (OnlineSequential Extreme Learning Machine) based training algorithm. In addition, wepropose a combination of L2 regularization and spectral normalization for theon-device reinforcement learning so that output values of the neural networkcan be fit into a certain range and the reinforcement learning becomes stable.The proposed reinforcement learning approach is designed for PYNQ-Z1 board as alow-cost FPGA platform. The evaluation results using OpenAI Gym demonstratethat the proposed algorithm and its FPGA implementation complete a CartPole-v0task 29.77x and 89.40x faster than a conventional DQN-based approach when thenumber of hidden-layer nodes is 64.