Abstract
Reinforcement Learning (RL) is an area of machine learning figuring out howagents take actions in an unknown environment to maximize its rewards. Unlikeclassical Markov Decision Process (MDP) in which agent has full knowledge ofits state, rewards, and transitional probability, reinforcement learningutilizes exploration and exploitation for the model uncertainty. Under thecondition that the model usually has a large state space, a neural network (NN)can be used to correlate its input state to its output actions to maximize theagent's rewards. However, building and training an efficient neural network ischallenging. Inspired by Double Q-learning and Asynchronous AdvantageActor-Critic (A3C) algorithm, we will propose and implement an improved versionof Double A3C algorithm which utilizing the strength of both algorithms to playOpenAI Gym Atari 2600 games to beat its benchmarks for our project.