Abstract
Sample inefficiency of deep reinforcement learning methods is a majorobstacle for their use in real-world applications. In this work, we show howhuman demonstrations can improve final performance of agents on the Minecraftminigame ObtainDiamond with only 8M frames of environment interaction. Wepropose a training procedure where policy networks are first trained on humandata and later fine-tuned by reinforcement learning. Using a policyexploitation mechanism, experience replay and an additional loss againstcatastrophic forgetting, our best agent was able to achieve a mean score of 48.Our proposed solution placed 3rd in the NeurIPS MineRL Competition forSample-Efficient Reinforcement Learning.
Quick Read (beta)
loading the full paper ...