Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs

Abstract

Reinforcement Learning (RL) has gained significant attention across variousdomains. However, the increasing complexity of RL programs presents testingchallenges, particularly the oracle problem: defining the correctness of the RLprogram. Conventional human oracles struggle to cope with the complexity,leading to inefficiencies and potential unreliability in RL testing. Toalleviate this problem, we propose an automated oracle approach that leveragesRL properties using fuzzy logic. Our oracle quantifies an agent's behavioralcompliance with reward policies and analyzes its trend over training episodes.It labels an RL program as "Buggy" if the compliance trend violatesexpectations derived from RL characteristics. We evaluate our oracle on RLprograms with varying complexities and compare it with human oracles. Resultsshow that while human oracles perform well in simpler testing scenarios, ourfuzzy oracle demonstrates superior performance in complex environments. Theproposed approach shows promise in addressing the oracle problem for RLtesting, particularly in complex cases where manual testing falls short. Itoffers a potential solution to improve the efficiency, reliability, andscalability of RL program testing. This research takes a step towards automatedtesting of RL programs and highlights the potential of fuzzy logic-basedoracles in tackling the oracle problem.

Quick Read (beta)

loading the full paper ...