Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

  • 2025-06-05 18:59:42
  • Xingjian Ran, Yixuan Li, Linning Xu, Mulin Yu, Bo Dai
  • 0

Abstract

Realistic 3D indoor scene synthesis is vital for embodied AI and digitalcontent creation. It can be naturally divided into two subtasks: objectgeneration and layout generation. While recent generative models havesignificantly advanced object-level quality and controllability, layoutgeneration remains challenging due to limited datasets. Existing methods eitheroverfit to these datasets or rely on predefined constraints to optimizenumerical layout that sacrifice flexibility. As a result, they fail to generatescenes that are both open-vocabulary and aligned with fine-grained userinstructions. We introduce DirectLayout, a framework that directly generatesnumerical 3D layouts from text descriptions using generalizable spatialreasoning of large language models (LLMs). DirectLayout decomposes thegeneration into three stages: producing a Bird's-Eye View (BEV) layout, liftingit into 3D space, and refining object placements. To enable explicit spatialreasoning and help the model grasp basic principles of object placement, weemploy Chain-of-Thought (CoT) Activation based on the 3D-Front dataset.Additionally, we design CoT-Grounded Generative Layout Reward to enhancegeneralization and spatial planning. During inference, DirectLayout addressesasset-layout mismatches via Iterative Asset-Layout Alignment through in-contextlearning. Extensive experiments demonstrate that DirectLayout achievesimpressive semantic consistency, generalization and physical plausibility.

 

Quick Read (beta)

loading the full paper ...