A Narration-based Reward Shaping Approach using Grounded Natural Language Commands

Abstract

While deep reinforcement learning techniques have led to agents that aresuccessfully able to learn to perform a number of tasks that had beenpreviously unlearnable, these techniques are still susceptible to thelongstanding problem of reward sparsity. This is especially true for tasks suchas training an agent to play StarCraft II, a real-time strategy game wherereward is only given at the end of a game which is usually very long. Whilethis problem can be addressed through reward shaping, such approaches typicallyrequire a human expert with specialized knowledge. Inspired by the vision ofenabling reward shaping through the more-accessible paradigm ofnatural-language narration, we develop a technique that can provide thebenefits of reward shaping using natural language commands. Ournarration-guided RL agent projects sequences of natural-language commands intothe same high-dimensional representation space as corresponding goal states. Weshow that we can get improved performance with our method compared totraditional reward-shaping approaches. Additionally, we demonstrate the abilityof our method to generalize to unseen natural-language commands.

Quick Read (beta)

loading the full paper ...