Abstract
Current deep reinforcement learning approaches incorporate minimal priorknowledge about the environment, limiting computational and sample efficiency.Objects provide a succinct and causal description of the world, and severalrecent works have studied unsupervised object representation learning usingpriors and losses over static object properties like visual consistency.However, object dynamics and interaction are critical cues for objectness. Inaddition, extensive research has shown humans have a working memory limited toonly a small number of task relevant objects. In this paper we propose aframework for reasoning about object dynamics and behavior to rapidly determineminimal and task-specific object representations. We show the need for thisreasoning over object behavior and dynamics by introducing a suite of RGBDMuJoCo object collection and avoidance tasks that, while intuitive and visuallysimple, confound state of the art unsupervised object representation learningalgorithms. We also demonstrate the potential of this framework on a number ofAtari games, using our object representation and standard RL and planningalgorithms to learn over 10,000x faster than standard deep RL algorithms, andfaster even than human players.