Formalizing the Problem of Side Effect Regularization

Abstract

AI objectives are often hard to specify properly. Some approaches tackle thisproblem by regularizing the AI's side effects: Agents must weigh off "how muchof a mess they make" with an imperfectly specified proxy objective. We proposea formal criterion for side effect regularization via the assistance gameframework. In these games, the agent solves a partially observable Markovdecision process (POMDP) representing its uncertainty about the objectivefunction it should optimize. We consider the setting where the true objectiveis revealed to the agent at a later time step. We show that this POMDP issolved by trading off the proxy reward with the agent's ability to achieve arange of future tasks. We empirically demonstrate the reasonableness of ourproblem formalization via ground-truth evaluation in two gridworldenvironments.

Quick Read (beta)

loading the full paper ...