Abstract
Reinforcement learning (RL) algorithms interact with their environment in atrial-and-error fashion. Such interactions can be expensive, inefficient, andtimely when learning on a physical system rather than in a simulation. Thiswork develops new runtime verification techniques to predict when the learningphase has not met or will not meet qualitative and timely expectations. Thispaper presents three verification properties concerning the quality andtimeliness of learning in RL algorithms. With each property, we propose designsteps for monitoring and assessing the properties during the system'soperation.
Quick Read (beta)
loading the full paper ...