Abstract
Geospatial models must adapt to the diversity of Earth observation data interms of resolutions, scales, and modalities. However, existing approachesexpect fixed input configurations, which limits their practical applicability.We propose AnySat, a multimodal model based on joint embedding predictivearchitecture (JEPA) and resolution-adaptive spatial encoders, allowing us totrain a single model on highly heterogeneous data in a self-supervised manner.To demonstrate the advantages of this unified approach, we compile GeoPlex, acollection of $5$ multimodal datasets with varying characteristics and $11$distinct sensors. We then train a single powerful model on these diversedatasets simultaneously. Once fine-tuned, we achieve better or nearstate-of-the-art results on the datasets of GeoPlex and $4$ additional ones for$5$ environment monitoring tasks: land cover mapping, tree speciesidentification, crop type classification, change detection, and floodsegmentation. The code and models are available athttps://github.com/gastruc/AnySat.