Abstract
What does it take to build mobile manipulation systems that can competentlyoperate on previously unseen objects in previously unseen environments? Thiswork answers this question using opening of articulated objects as a mobilemanipulation testbed. Specifically, our focus is on the end-to-end performanceon this task without any privileged information, i.e. the robot starts at alocation with the novel target articulated object in view, and has to approachthe object and successfully open it. We first develop a system for this task,and then conduct 100+ end-to-end system tests across 13 real world test sites.Our large-scale study reveals a number of surprising findings: a) modularsystems outperform end-to-end learned systems for this task, even when theend-to-end learned systems are trained on 1000+ demonstrations, b) perception,and not precise end-effector control, is the primary bottleneck to tasksuccess, and c) state-of-the-art articulation parameter estimation modelsdeveloped in isolation struggle when faced with robot-centric viewpoints.Overall, our findings highlight the limitations of developing components of thepipeline in isolation and underscore the need for system-level research,providing a pragmatic roadmap for building generalizable mobile manipulationsystems. Videos, code, and models are available on the project website:https://arjung128.github.io/opening-articulated-objects/