CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

Abstract

Despite the recent successes of reinforcement learning in games and robotics,it is yet to become broadly practical. Sample efficiency and unreliableperformance in rare but challenging scenarios are two of the major obstacles.Drawing inspiration from the effectiveness of deliberate practice for achievingexpert-level human performance, we propose a new adversarial sampling approachguided by a failure predictor named "CoachNet". CoachNet is trained onlinealong with the agent to predict the probability of failure. This probability isthen used in a stochastic sampling process to guide the agent to morechallenging episodes. This way, instead of wasting time on scenarios that theagent has already mastered, training is focused on the agent's "weak spots". Wepresent the design of CoachNet, explain its underlying principles, andempirically demonstrate its effectiveness in improving sample efficiency andtest-time robustness in common continuous control tasks.

Quick Read (beta)

loading the full paper ...