LL3M: Large Language 3D Modelers

Abstract

We present LL3M, a multi-agent system that leverages pretrained largelanguage models (LLMs) to generate 3D assets by writing interpretable Pythoncode in Blender. We break away from the typical generative approach that learnsfrom a collection of 3D data. Instead, we reformulate shape generation as acode-writing task, enabling greater modularity, editability, and integrationwith artist workflows. Given a text prompt, LL3M coordinates a team ofspecialized LLM agents to plan, retrieve, write, debug, and refine Blenderscripts that generate and edit geometry and appearance. The generated codeworks as a high-level, interpretable, human-readable, well-documentedrepresentation of scenes and objects, making full use of sophisticated Blenderconstructs (e.g. B-meshes, geometry modifiers, shader nodes) for diverse,unconstrained shapes, materials, and scenes. This code presents many avenuesfor further agent and human editing and experimentation via code tweaks orprocedural parameters. This medium naturally enables a co-creative loop in oursystem: agents can automatically self-critique using code and visuals, whileiterative user instructions provide an intuitive way to refine assets. A sharedcode context across agents enables awareness of previous attempts, and aretrieval-augmented generation knowledge base built from Blender APIdocumentation, BlenderRAG, equips agents with examples, types, and functionsempowering advanced modeling operations and code correctness. We demonstratethe effectiveness of LL3M across diverse shape categories, style and materialedits, and user-driven refinements. Our experiments showcase the power of codeas a generative and interpretable medium for 3D asset creation. Our projectpage is at https://threedle.github.io/ll3m.

Quick Read (beta)

loading the full paper ...