Abstract
Humans do not passively observe the visual world -- we actively look in orderto act. Motivated by this principle, we introduce EyeRobot, a robotic systemwith gaze behavior that emerges from the need to complete real-world tasks. Wedevelop a mechanical eyeball that can freely rotate to observe its surroundingsand train a gaze policy to control it using reinforcement learning. Weaccomplish this by first collecting teleoperated demonstrations paired with a360 camera. This data is imported into a simulation environment that supportsrendering arbitrary eyeball viewpoints, allowing episode rollouts of eye gazeon top of robot demonstrations. We then introduce a BC-RL loop to train thehand and eye jointly: the hand (BC) agent is trained from rendered eyeobservations, and the eye (RL) agent is rewarded when the hand produces correctaction predictions. In this way, hand-eye coordination emerges as the eye lookstowards regions which allow the hand to complete the task. EyeRobot implementsa foveal-inspired policy architecture allowing high resolution with a smallcompute budget, which we find also leads to the emergence of more stablefixation as well as improved ability to track objects and ignore distractors.We evaluate EyeRobot on five panoramic workspace manipulation tasks requiringmanipulation in an arc surrounding the robot arm. Our experiments suggestEyeRobot exhibits hand-eye coordination behaviors which effectively facilitatemanipulation over large workspaces with a single camera. See project site forvideos: https://www.eyerobot.net/