Translating a Visual LEGO Manual to a Machine-Executable Plan

  • 2022-07-26 00:35:46
  • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
We study the problem of translating an image-based, step-by-step assemblymanual created by human designers into machine-interpretable instructions. Weformulate this problem as a sequential prediction task: at each step, our modelreads the manual, locates the components to be added to the current shape, andinfers their 3D poses. This task poses the challenge of establishing a 2D-3Dcorrespondence between the manual image and the real 3D object, and 3D poseestimation for unseen 3D objects, since a new component to be added in a stepcan be an object built from previous steps. To address these two challenges, wepresent a novel learning-based framework, the Manual-to-Executable-Plan Network(MEPNet), which reconstructs the assembly steps from a sequence of manualimages. The key idea is to integrate neural 2D keypoint detection modules and2D-3D projection algorithms for high-precision prediction and stronggeneralization to unseen components. The MEPNet outperforms existing methods onthree newly collected LEGO manual datasets and a Minecraft house dataset.


