Abstract
Learning how to act when there are many available actions in each state is achallenging task for Reinforcement Learning (RL) agents, especially when manyof the actions are redundant or irrelevant. In such cases, it is sometimeseasier to learn which actions not to take. In this work, we propose theAction-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RLalgorithm with an Action Elimination Network (AEN) that eliminates sub-optimalactions. The AEN is trained to predict invalid actions, supervised by anexternal elimination signal provided by the environment. Simulationsdemonstrate a considerable speedup and added robustness over vanilla DQN intext-based games with over a thousand discrete actions.
Quick Read (beta)
loading the full paper ...