Abstract
Deep reinforcement learning is a technique for solving problems in a varietyof environments, ranging from Atari video games to stock trading. This methodleverages deep neural network models to make decisions based on observations ofa given environment with the goal of maximizing a reward function that canincorporate cost and rewards for reaching goals. With the aim of pathfinding,reward conditions can include reaching a specified target area along with costsfor movement. In this work, multiple Deep Q-Network (DQN) agents are trained tooperate in a partially observable environment with the goal of reaching atarget zone in minimal travel time. The agent operates based on a visualrepresentation of its surroundings, and thus has a restricted capability toobserve the environment. A comparison between DQN, DQN-GRU, and DQN-LSTM isperformed to examine each models capabilities with two different types ofinput. Through this evaluation, it is been shown that with equivalent trainingand analogous model architectures, a DQN model is able to outperform itsrecurrent counterparts.