Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

Abstract

Deep Reinforcement Learning (DRL) has shown its promising capabilities tolearn optimal policies directly from trial and error. However, learning can behindered if the goal of the learning, defined by the reward function, is "notoptimal". We demonstrate that by setting the goal/target of competition in acounter-intuitive but intelligent way, instead of heuristically tryingsolutions through many hours the DRL simulation can quickly converge into awinning strategy. The ICRA-DJI RoboMaster AI Challenge is a game of cooperationand competition between robots in a partially observable environment, quitesimilar to the Counter-Strike game. Unlike the traditional approach to games,where the reward is given at winning the match or hitting the enemy, our DRLalgorithm rewards our robots when in a geometric-strategic advantage, whichimplicitly increases the winning chances. Furthermore, we use Deep Q Learning(DQL) to generate multi-agent paths for moving, which improves the cooperationbetween two robots by avoiding the collision. Finally, we implement a variantA* algorithm with the same implicit geometric goal as DQL and compare results.We conclude that a well-set goal can put in question the need for learningalgorithms, with geometric-based searches outperforming DQL in many orders ofmagnitude.

Quick Read (beta)

loading the full paper ...