Abstract
Most state-of-the-art approaches for weather and climate modeling are basedon physics-informed numerical models of the atmosphere. These approaches aim tomodel the non-linear dynamics and complex interactions between multiplevariables, which are challenging to approximate. Additionally, many suchnumerical models are computationally intensive, especially when modeling theatmospheric phenomenon at a fine-grained spatial and temporal resolution.Recent data-driven approaches based on machine learning instead aim to directlysolve a downstream forecasting or projection task by learning a data-drivenfunctional mapping using deep neural networks. However, these networks aretrained using curated and homogeneous climate datasets for specificspatiotemporal tasks, and thus lack the generality of numerical models. Wedevelop and demonstrate ClimaX, a flexible and generalizable deep learningmodel for weather and climate science that can be trained using heterogeneousdatasets spanning different variables, spatio-temporal coverage, and physicalgroundings. ClimaX extends the Transformer architecture with novel encoding andaggregation blocks that allow effective use of available compute whilemaintaining general utility. ClimaX is pre-trained with a self-supervisedlearning objective on climate datasets derived from CMIP6. The pre-trainedClimaX can then be fine-tuned to address a breadth of climate and weathertasks, including those that involve atmospheric variables and spatio-temporalscales unseen during pretraining. Compared to existing data-driven baselines,we show that this generality in ClimaX results in superior performance onbenchmarks for weather forecasting and climate projections, even whenpretrained at lower resolutions and compute budgets.