Avoiding Side Effects By Considering Future Tasks

Abstract

Designing reward functions is difficult: the designer has to specify what todo (what it means to complete the task) as well as what not to do (side effectsthat should be avoided while completing the task). To alleviate the burden onthe reward designer, we propose an algorithm to automatically generate anauxiliary reward function that penalizes side effects. This auxiliary objectiverewards the ability to complete possible future tasks, which decreases if theagent causes side effects during the current task. The future task reward canalso give the agent an incentive to interfere with events in the environmentthat make future tasks less achievable, such as irreversible actions by otheragents. To avoid this interference incentive, we introduce a baseline policythat represents a default course of action (such as doing nothing), and use itto filter out future tasks that are not achievable by default. We formallydefine interference incentives and show that the future task approach with abaseline policy avoids these incentives in the deterministic case. Usinggridworld environments that test for side effects and interference, we showthat our method avoids interference and is more effective for avoiding sideeffects than the common approach of penalizing irreversible actions.

Quick Read (beta)

loading the full paper ...