Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes

Abstract

This paper introduces a novel integration of Large Language Models (LLMs)with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D sceneediting, particularly for object repositioning tasks, which traditionallyrequires intricate manual processes and specialized expertise. Theseconventional methods typically suffer from reliance on large training datasetsor lack a formalized language for precise edits. Utilizing CGA as a robustformal language, our system, Shenlong, precisely models spatial transformationsnecessary for accurate object repositioning. Leveraging the zero-shot learningcapabilities of pre-trained LLMs, Shenlong translates natural languageinstructions into CGA operations which are then applied to the scene,facilitating exact spatial transformations within 3D scenes without the needfor specialized pre-training. Implemented in a realistic simulationenvironment, Shenlong ensures compatibility with existing graphics pipelines.To accurately assess the impact of CGA, we benchmark against robust EuclideanSpace baselines, evaluating both latency and accuracy. Comparative performanceevaluations indicate that Shenlong significantly reduces LLM response times by16% and boosts success rates by 9.6% on average compared to the traditionalmethods. Notably, Shenlong achieves a 100% perfect success rate in commonpractical queries, a benchmark where other systems fall short. Theseadvancements underscore Shenlong's potential to democratize 3D scene editing,enhancing accessibility and fostering innovation across sectors such aseducation, digital entertainment, and virtual reality.

Quick Read (beta)

loading the full paper ...