GeoArena: An Open Platform for Benchmarking Large Vision-language Models on WorldWide Image Geolocalization

Abstract

Image geolocalization aims to predict the geographic location of imagescaptured anywhere on Earth, but its global nature presents significantchallenges. Current evaluation methodologies suffer from two major limitations.First, data leakage: advanced approaches often rely on large vision-languagemodels (LVLMs) to predict image locations, yet these models are frequentlypretrained on the test datasets, compromising the accuracy of evaluating amodel's actual geolocalization capability. Second, existing metrics primarilyrely on exact geographic coordinates to assess predictions, which not onlyneglects the reasoning process but also raises privacy concerns when user-levellocation data is required. To address these issues, we propose GeoArena, afirst open platform for evaluating LVLMs on worldwide image geolocalizationtasks, offering true in-the-wild and human-centered benchmarking. GeoArenaenables users to upload in-the-wild images for a more diverse evaluationcorpus, and it leverages pairwise human judgments to determine which modeloutput better aligns with human expectations. Our platform has been deployedonline for two months, during which we collected over thousands voting records.Based on this data, we conduct a detailed analysis and establish a leaderboardof different LVLMs on the image geolocalization task.

Quick Read (beta)

loading the full paper ...