Abstract
Today, Earth Observation (EO) satellites generate massive volumes of data,with the Copernicus Sentinel-2 constellation alone producing approximately1.6TB per day. To fully exploit this information, it is essential to pretrainEO Foundation Models (FMs) on large unlabeled datasets, enabling efficientfine-tuning for several different downstream tasks with minimal labeled data.In this work, we present the scaling-up of our recently proposed EO FoundationModel, PhilEO Geo-Aware U-Net, on the unlabeled 23TB dataset MajorTOM, whichcovers the vast majority of the Earth's surface, as well as on the specializedsubset FastTOM 2TB that does not include oceans and ice. We develop and studyvarious PhilEO model variants with different numbers of parameters andarchitectures. Finally, we fine-tune the models on the PhilEO Bench for roaddensity estimation, building density pixel-wise regression, and land coversemantic segmentation, and we evaluate the performance. Our results demonstratethat for all n-shots for road density regression, the PhilEO 44M MajorTOM 23TBmodel outperforms PhilEO Globe 0.5TB 44M. We also show that for most n-shotsfor road density estimation and building density regression, PhilEO 200MFastTOM outperforms all the other models. The effectiveness of both dataset andmodel scaling is validated using the PhilEO Bench. We also study the impact ofarchitecture scaling, transitioning from U-Net Convolutional Neural Networks(CNN) to Vision Transformers (ViT).