Abstract
Urban tree biodiversity is critical for climate resilience, ecologicalstability, and livability in cities, yet most municipalities lack detailedknowledge of their canopies. Field-based inventories provide reliable estimatesof Shannon and Simpson diversity but are costly and time-consuming, whilesupervised AI methods require labeled data that often fail to generalize acrossregions. We introduce an unsupervised clustering framework that integratesvisual embeddings from street-level imagery with spatial planting patterns toestimate biodiversity without labels. Applied to eight North American cities,the method recovers genus-level diversity patterns with high fidelity,achieving low Wasserstein distances to ground truth for Shannon and Simpsonindices and preserving spatial autocorrelation. This scalable, fine-grainedapproach enables biodiversity mapping in cities lacking detailed inventoriesand offers a pathway for continuous, low-cost monitoring to support equitableaccess to greenery and adaptive management of urban ecosystems.