Abstract
Designing optimal reward functions has been desired but extremely difficultin reinforcement learning (RL). When it comes to modern complex tasks,sophisticated reward functions are widely used to simplify policy learning yeteven a tiny adjustment on them is expensive to evaluate due to the drasticallyincreasing cost of training. To this end, we propose a hindsight rewardtweaking approach by designing a novel paradigm for deep reinforcement learningto model the influences of reward functions within a near-optimal space. Wesimply extend the input observation with a condition vector linearly correlatedwith the effective environment reward parameters and train the model in aconventional manner except for randomizing reward configurations, obtaining ahyper-policy whose characteristics are sensitively regulated over the conditionspace. We demonstrate the feasibility of this approach and study one of itspotential application in policy performance boosting with multiple MuJoCotasks.