Abstract
We introduce LegoGPT, the first approach for generating physically stableLEGO brick models from text prompts. To achieve this, we construct alarge-scale, physically stable dataset of LEGO designs, along with theirassociated captions, and train an autoregressive large language model topredict the next brick to add via next-token prediction. To improve thestability of the resulting designs, we employ an efficient validity check andphysics-aware rollback during autoregressive inference, which prunes infeasibletoken predictions using physics laws and assembly constraints. Our experimentsshow that LegoGPT produces stable, diverse, and aesthetically pleasing LEGOdesigns that align closely with the input text prompts. We also develop atext-based LEGO texturing method to generate colored and textured designs. Weshow that our designs can be assembled manually by humans and automatically byrobotic arms. We also release our new dataset, StableText2Lego, containing over47,000 LEGO structures of over 28,000 unique 3D objects accompanied by detailedcaptions, along with our code and models at the project website:https://avalovelace1.github.io/LegoGPT/.