Abstract
Automatic indoor layout generation has attracted increasing attention due toits potential in interior design, virtual environment construction, andembodied AI. Existing methods fall into two categories: prompt-drivenapproaches that leverage proprietary LLM services (e.g., GPT APIs) andlearning-based methods trained on layout data upon diffusion-based models.Prompt-driven methods often suffer from spatial inconsistency and highcomputational costs, while learning-based methods are typically constrained bycoarse relational graphs and limited datasets, restricting their generalizationto diverse room categories. In this paper, we revisit LLM-based indoor layoutgeneration and present 3D-SynthPlace, a large-scale dataset that combinessynthetic layouts generated via a 'GPT synthesize, Human inspect' pipeline,upgraded from the 3D-Front dataset. 3D-SynthPlace contains nearly 17,000scenes, covering four common room types -- bedroom, living room, kitchen, andbathroom -- enriched with diverse objects and high-level spatial annotations.We further introduce OptiScene, a strong open-source LLM optimized for indoorlayout generation, fine-tuned based on our 3D-SynthPlace dataset through ourtwo-stage training. For the warum-up stage I, we adopt supervised fine-tuning(SFT), which is taught to first generate high-level spatial descriptions thenconditionally predict concrete object placements. For the reinforcing stage II,to better align the generated layouts with human design preferences, we applymulti-turn direct preference optimization (DPO), which significantly improvinglayout quality and generation success rates. Extensive experiments demonstratethat OptiScene outperforms traditional prompt-driven and learning-basedbaselines. Moreover, OptiScene shows promising potential in interactive taskssuch as scene editing and robot navigation.