SITE: towards Spatial Intelligence Thorough Evaluation

  • 2025-05-08 18:45:44
  • Wenqi Wang, Reuben Tan, Pengyue Zhu, Jianwei Yang, Zhengyuan Yang, Lijuan Wang, Andrey Kolobov, Jianfeng Gao, Boqing Gong
  • 0

Abstract

Spatial intelligence (SI) represents a cognitive ability encompassing thevisualization, manipulation, and reasoning about spatial relationships,underpinning disciplines from neuroscience to robotics. We introduce SITE, abenchmark dataset towards SI Thorough Evaluation in a standardized format ofmulti-choice visual question-answering, designed to assess largevision-language models' spatial intelligence across diverse visual modalities(single-image, multi-image, and video) and SI factors (figural to environmentalscales, spatial visualization and orientation, intrinsic and extrinsic, staticand dynamic). Our approach to curating the benchmark combines a bottom-upsurvey about 31 existing datasets and a top-down strategy drawing upon threeclassification systems in cognitive science, which prompt us to design twonovel types of tasks about view-taking and dynamic scenes. Extensiveexperiments reveal that leading models fall behind human experts especially inspatial orientation, a fundamental SI factor. Moreover, we demonstrate apositive correlation between a model's spatial reasoning proficiency and itsperformance on an embodied AI task.

 

Quick Read (beta)

loading the full paper ...