Abstract
In the rapidly advancing field of robotics, dual-arm coordination and complexobject manipulation are essential capabilities for developing advancedautonomous systems. However, the scarcity of diverse, high-qualitydemonstration data and real-world-aligned evaluation benchmarks severely limitssuch development. To address this, we introduce RoboTwin, a generative digitaltwin framework that uses 3D generative foundation models and large languagemodels to produce diverse expert datasets and provide a real-world-alignedevaluation platform for dual-arm robotic tasks. Specifically, RoboTwin createsvaried digital twins of objects from single 2D images, generating realistic andinteractive scenarios. It also introduces a spatial relation-aware codegeneration framework that combines object annotations with large languagemodels to break down tasks, determine spatial constraints, and generate preciserobotic movement code. Our framework offers a comprehensive benchmark with bothsimulated and real-world data, enabling standardized evaluation and betteralignment between simulated training and real-world performance. We validatedour approach using the open-source COBOT Magic Robot platform. Policiespre-trained on RoboTwin-generated data and fine-tuned with limited real-worldsamples improve the success rate of over 70% for single-arm tasks and over 40%for dual-arm tasks compared to models trained solely on real-world data. Thissignificant improvement demonstrates RoboTwin's potential to enhance thedevelopment and evaluation of dual-arm robotic manipulation systems. ProjectPage: https://robotwin-benchmark.github.io/early-version/.