Abstract
Reinforcement Learning has revolutionized decision-making processes indynamic environments, yet it often struggles with autonomously detecting andachieving goals without clear feedback signals. For example, in a Source TermEstimation problem, the lack of precise environmental information makes itchallenging to provide clear feedback signals and to define and evaluate howthe source's location is determined. To address this challenge, the AutonomousGoal Detection and Cessation (AGDC) module was developed, enhancing various RLalgorithms by incorporating a self-feedback mechanism for autonomous goaldetection and cessation upon task completion. Our method effectively identifiesand ceases undefined goals by approximating the agent's belief, significantlyenhancing the capabilities of RL algorithms in environments with limitedfeedback. To validate effectiveness of our approach, we integrated AGDC withdeep Q-Network, proximal policy optimization, and deep deterministic policygradient algorithms, and evaluated its performance on the Source TermEstimation problem. The experimental results showed that AGDC-enhanced RLalgorithms significantly outperformed traditional statistical methods such asinfotaxis, entrotaxis, and dual control for exploitation and exploration, aswell as a non-statistical random action selection method. These improvementswere evident in terms of success rate, mean traveled distance, and search time,highlighting AGDC's effectiveness and efficiency in complex, real-worldscenarios.