Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Abstract

Traditional Reinforcement Learning (RL) problems depend on an exhaustivesimulation environment that models real-world physics of the problem and trainsthe RL agent by observing this environment. In this paper, we present a novelapproach to creating an environment by modeling the reward function based onempirical rules extracted from human domain knowledge of the system understudy. Using this empirical rewards function, we will build an environment andtrain the agent. We will first create an environment that emulates the effectof setting cabin temperature through thermostat. This is typically done in RLproblems by creating an exhaustive model of the system with detailedthermodynamic study. Instead, we propose an empirical approach to model thereward function based on human domain knowledge. We will document some rules ofthumb that we usually exercise as humans while setting thermostat temperatureand try and model these into our reward function. This modeling of empiricalhuman domain rules into a reward function for RL is the unique aspect of thispaper. This is a continuous action space problem and using deep deterministicpolicy gradient (DDPG) method, we will solve for maximizing the rewardfunction. We will create a policy network that predicts optimal temperaturesetpoint given external temperature and humidity.

Quick Read (beta)

loading the full paper ...