A Search-Based Testing Approach for Deep Reinforcement Learning Agents

  • 2023-01-14 18:31:43
  • Amirhossein Zolfagharian, Manel Abdellatif, Lionel Briand, Mojtaba Bagherzadeh, Ramesh S
  • 0


Deep Reinforcement Learning (DRL) algorithms have been increasingly employedduring the last decade to solve various decision-making problems such asautonomous driving and robotics. However, these algorithms have faced greatchallenges when deployed in safety-critical environments since they oftenexhibit erroneous behaviors that can lead to potentially critical errors. Oneway to assess the safety of DRL agents is to test them to detect possiblefaults leading to critical failures during their execution. This raises thequestion of how we can efficiently test DRL policies to ensure theircorrectness and adherence to safety requirements. Most existing works ontesting DRL agents use adversarial attacks that perturb states or actions ofthe agent. However, such attacks often lead to unrealistic states of theenvironment. Their main goal is to test the robustness of DRL agents ratherthan testing the compliance of agents' policies with respect to requirements.Due to the huge state space of DRL environments, the high cost of testexecution, and the black-box nature of DRL algorithms, the exhaustive testingof DRL agents is impossible. In this paper, we propose a Search-based TestingApproach of Reinforcement Learning Agents (STARLA) to test the policy of a DRLagent by effectively searching for failing executions of the agent within alimited testing budget. We use machine learning models and a dedicated geneticalgorithm to narrow the search towards faulty episodes. We apply STARLA onDeep-Q-Learning agents which are widely used as benchmarks and show that itsignificantly outperforms Random Testing by detecting more faults related tothe agent's policy. We also investigate how to extract rules that characterizefaulty episodes of the DRL agent using our search results. Such rules can beused to understand the conditions under which the agent fails and thus assessits deployment risks.


Quick Read (beta)

loading the full paper ...