Design and Comparison of Reward Functions in Reinforcement Learning for Energy Management of Sensor Nodes

Abstract

Interest in remote monitoring has grown thanks to recent advancements inInternet-of-Things (IoT) paradigms. New applications have emerged, using smalldevices called sensor nodes capable of collecting data from the environment andprocessing it. However, more and more data are processed and transmitted withlonger operational periods. At the same, the battery technologies have notimproved fast enough to cope with these increasing needs. This makes the energyconsumption issue increasingly challenging and thus, miniaturized energyharvesting devices have emerged to complement traditional energy sources.Nevertheless, the harvested energy fluctuates significantly during the nodeoperation, increasing uncertainty in actually available energy resources.Recently, approaches in energy management have been developed, in particularusing reinforcement learning approaches. However, in reinforcement learning,the algorithm's performance relies greatly on the reward function. In thispaper, we present two contributions. First, we explore five different rewardfunctions to identify the most suitable variables to use in such functions toobtain the desired behaviour. Experiments were conducted using the Q-learningalgorithm to adjust the energy consumption depending on the energy harvested.Results with the five reward functions illustrate how the choice thereofimpacts the energy consumption of the node. Secondly, we propose two additionalreward functions able to find the compromise between energy consumption and anode performance using a non-fixed balancing parameter. Our simulation resultsshow that the proposed reward functions adjust the node's performance dependingon the battery level and reduce the learning time.

Quick Read (beta)

loading the full paper ...