Measuring and avoiding side effects using relative reachability

  • 2018-06-04 16:30:17
  • Victoria Krakovna, Laurent Orseau, Miljan Martic, Shane Legg
  • 84


How can we design reinforcement learning agents that avoid causingunnecessary disruptions to their environment? We argue that current approachesto penalizing side effects can introduce bad incentives in tasks that requireirreversible actions, and in environments that contain sources of change otherthan the agent. For example, some approaches give the agent an incentive toprevent any irreversible changes in the environment, including the actions ofother agents. We introduce a general definition of side effects, based onrelative reachability of states compared to a default state, that avoids theseundesirable incentives. Using a set of gridworld experiments illustratingrelevant scenarios, we empirically compare relative reachability to penaltiesbased on existing definitions and show that it is the only penalty among thosetested that produces the desired behavior in all the scenarios.


Introduction (beta)



Conclusion (beta)