Abstract
Objects are made of parts, each with distinct geometry, physics,functionality, and affordances. Developing such a distributed, physical,interpretable representation of objects will facilitate intelligent agents tobetter explore and interact with the world. In this paper, we study physicalprimitive decomposition---understanding an object through its components, eachwith physical and geometric attributes. As annotated data for object parts andphysics are rare, we propose a novel formulation that learns physicalprimitives by explaining both an object's appearance and its behaviors inphysical events. Our model performs well on block towers and tools in bothsynthetic and real scenarios; we also demonstrate that visual and physicalobservations often provide complementary signals. We further present ablationand behavioral studies to better understand our model and contrast it withhuman performance.