Abstract
Modern interactive applications increasingly demand dynamic 3D content, yetthe transformation of static 3D models into animated assets constitutes asignificant bottleneck in content creation pipelines. While recent advances ingenerative AI have revolutionized static 3D model creation, rigging andanimation continue to depend heavily on expert intervention. We presentPuppeteer, a comprehensive framework that addresses both automatic rigging andanimation for diverse 3D objects. Our system first predicts plausible skeletalstructures via an auto-regressive transformer that introduces a joint-basedtokenization strategy for compact representation and a hierarchical orderingmethodology with stochastic perturbation that enhances bidirectional learningcapabilities. It then infers skinning weights via an attention-basedarchitecture incorporating topology-aware joint attention that explicitlyencodes inter-joint relationships based on skeletal graph distances. Finally,we complement these rigging advances with a differentiable optimization-basedanimation pipeline that generates stable, high-fidelity animations while beingcomputationally more efficient than existing approaches. Extensive evaluationsacross multiple benchmarks demonstrate that our method significantlyoutperforms state-of-the-art techniques in both skeletal prediction accuracyand skinning quality. The system robustly processes diverse 3D content, rangingfrom professionally designed game assets to AI-generated shapes, producingtemporally coherent animations that eliminate the jittering issues common inexisting methods.