Abstract
With the impact of real-time processing being realized in the recent past,the need for efficient implementations of reinforcement learning algorithms hasbeen on the rise. Albeit the numerous advantages of Bellman equations utilizedin RL algorithms, they are not without the large search space of designparameters. This research aims to shed light on the design space exploration associatedwith reinforcement learning parameters, specifically that of Policy Iteration.Given the large computational expenses of fine-tuning the parameters ofreinforcement learning algorithms, we propose an auto-tuner-based ordinalregression approach to accelerate the process of exploring these parametersand, in return, accelerate convergence towards an optimal policy. Our approachprovides 1.82x peak speedup with an average of 1.48x speedup over the previousstate-of-the-art.