ClimXtract: A Python Toolkit for Standardizing High-Resolution Climate Datasets for Regional Domains

Maximilian Meindl; Luiza Sabchuk; Aiko Voigt

doi:10.5334/jors.627

Full Article

(1) Overview

Introduction

As the demand for high-resolution climate data has increased rapidly in recent years, so too has its availability. The sources of high-resolution data are manifold and include observations, reanalyses, as well as simulations with regional and global models. This diversity can make working with these datasets challenging, as they use different file formats, spatial resolutions, coordinate systems, variable naming conventions, and physical units. Applications of climate data often need to combine various data sources for regional climate studies; however, inconsistencies in grid type, projection, and units make cross-dataset comparison time-consuming and error-prone, particularly for users without a strong technical background. Existing community tools such as ESMValTool [5] or Pangeo address aspects of this challenge, but are often complex to configure, computationally heavy, or optimized for global workflows rather than regional applications.

Regional climate models (RCMs) downscale coarse global climate model outputs to higher spatial resolutions, typically on the order of 10 km. While this approach improves the representation of regional and local climate phenomena, systematic biases remain, with simulations tending to be too cold, too wet, and too windy [6]. For Austria, the ÖKS15 climate dataset has become a widely used national reference for climate change studies. ÖKS15 was generated by RCMs driven by coarse global climate model simulations, followed by statistical downscaling to 1 km horizontal resolution. Similarly, other regional datasets, such as CH2025 for Switzerland [7], provide high-resolution climate information for national studies. Matching other datasets to these specific regional formats and grids requires several technical steps, including, among others, spatial interpolation and masking to the geographic domain of interest.

Recently, initiatives such as Destination Earth (DestinE) [8] and nextGEMS [9] have begun to generate multi-decadal global simulations directly at kilometer-scale resolutions using state-of-the-art Earth system models such as the ICOsahedral Nonhydrostatic (ICON) model and the Integrated Forecasting System (IFS). These global km-scale models represent a major step forward compared to traditional climate model hierarchies: they explicitly resolve small-scale atmospheric, land, and ocean processes, including convection, topography-driven circulations, and mesoscale ocean eddies. These processes previously required physical parameterizations. The resulting simulations provide global climate information with spatial detail comparable to traditional regional downscaling approaches. While this opens up new opportunities for regional climate research, it also introduces significant challenges in terms of data volume, heterogeneity, and access.

ClimXtract simplifies access to km-scale climate data by bridging global and regional data sources and addresses the technical challenges described above by providing an easy-to-use and customizable solution that:

unifies access to observational, model, and reanalysis data,
remaps diverse spatial grids to a consistent, high-resolution target grid,
resolves inconsistencies in variable naming, units, and metadata, and
supports a reproducible and modular workflow.

Implementation and Architecture

Installing ClimXtract is straightforward. To handle the dependencies of ClimXtract, an installation within a dedicated environment is recommended to avoid conflicts with other Python packages. Using Conda, this can be done as follows:

# create conda environment named 'climxtract'
conda create -n climxtract -c conda-forge -y
   python=3.10
 
# update the environment using specifications in
# environment.yml
conda env update -n climxtract -f environment.yml
 
# activate the 'climxtract' environment
conda activate climxtract

Because ClimXtract is already included in the environment specification, no additional installation via pip is needed. This setup ensures that all dependencies are properly installed and isolated, making the toolkit ready to be used immediately. Following installation, the package can be imported in Python in the usual manner:

import climxtract as cxt

ClimXtract provides a modular pipeline for preparing high-resolution climate datasets for regional analysis. Its three capabilities are downloading, regridding, and masking, all designed to support interoperability and reproducibility. Although developed with Austria and the ÖKS15 dataset in mind, the toolkit can be configured to work with any user-defined target grid. In this paper, we demonstrate its use with ÖKS15 as the target dataset.

Despite its modular design, some constraints apply. The toolkit is currently configured for surface temperature and precipitation, as these variables are among the most frequently used variables in climate science. Adding new variables requires careful handling of naming conventions, metadata, and units. To support such extensions, ClimXtract stores all variable-specific information in a central configuration dictionary (dictionary.py) that defines the expected units, variable names, and supported datasets. Users can introduce new variables by updating the dictionary without modifying core functions, for example, for maximum temperature:

from climxtract.dictionary import dictionary
dictionary.setdefault('tasmax', {})
dictionary['tasmax']['oeks15']={'name': 'tasmax',
                                  'units': ['Celsius']}

However, integrating a new dataset is inherently more complex. Each dataset follows its own conventions for data access, file structure, and grid description, and therefore requires the user to introduce a new dedicated loading or download function. Further remarks on this are provided in the section “Reuse Potential.”

1. Downloading

ClimXtract includes dedicated download functions to access and retrieve data from seven major climate datasets relevant for the Austrian domain:

Table 1

Overview of climate datasets included in ClimXtract.

DATASET	TYPE	SPATIAL RES.	TIME RES.	COVERAGE
ÖKS15	Model	1 km	daily	Austria
SPARTACUS	Observation	1 km	daily	Austria
EURO-CORDEX	Model	12.5 km	daily	Europe
E-OBS	Observation	11 km	daily	Europe
DestinE Climate DT	Model	5–10 km	hourly	Global
ERA5	Reanalysis	30 km	hourly/daily	Global
ERA5-Land	Reanalysis	9 km	hourly/daily	Global

These datasets differ in format, temporal and spatial resolution, and access method. ClimXtract handles this heterogeneity through automated interfaces such as wget, cdsapi, and the ESGF PyClient. For example, temperature data from the ICON model from DestinE is accessed via the Polytope Client, while ÖKS15 and SPARTACUS data for Austria are retrieved using HTTP filelisting from the GeoSphere Austria Data Hub. Similarly, EURO-CORDEX data is obtained through ESGF nodes, E-OBS via the European Climate Assessment & Dataset (ECA&D) project, and global reanalyses such as ERA5 and ERA5-Land are accessed through the Copernicus Climate Data Store (CDS). In all cases, these interfaces are used in the background of the load() function.

# load the dataset for 'tas' variable of 'oeks15' type
t_oeks15 = cxt.load(type='oeks15',
         model_global='MPI-M-MPI-ESM-LR',
         model_regional='SMHI-RCA4', resolution=None,
         variable='tas', experiment='rcp45',
         ens='r1i1p1', start=None, end=None,
         output_path=output_path_oeks15)
 
# load the dataset for 'tas' variable of 'destine' type
t_destine = cxt.load(type='destine',
           model_global='ICON', model_regional=None,
           resolution=None, variable='tas',
           experiment='SSP3-7.0', ens=None,
           start='20210101', end='20231231',
           output_path=output_path_destine)

As an example, Figure 1 shows the global-mean surface temperature simulated by the ICON model for the DestinE Climate Digital Twin.

Global mean near-surface air temperature averaged for the years 2021 to 2023 as simulated by the ICON model for the DestinE Climate Digital Twin.

2. Regridding

Climate datasets from different sources typically come on different horizontal grids (regular lat-lon, rotated pole, Lambert conformal, HEALPix, etc.). ClimXtract offers automated regridding to a user-defined target grid. In our example, we regrid to the ÖKS15 grid (Lambert conformal conic, 1 km resolution), but this can be adapted for any custom NetCDF grid file.

# regrid the destine dataset using distance-weighted
# interpolation
t_regrid_destine = cxt.regrid(type='distance',
            target_file=t_oeks15[0],
            input_file=t_destine[0],
            output_path_regrid=output_path_regridded)

Supported interpolation methods include:

nearest neighbor (remapnn) – robust for sparse or irregular grid (e.g., HEALPix),
bilinear (remapbil) – smooth field interpolation for continuous variables,
conservative (remapcon) – preserves area-integrated quantities (e.g., precipitation rates and radiative fluxes), and
distance-weighted (remapdis) – smoother alternative when cell geometry is not well defined.

The interpolation methods are implemented using the CDO command-line interface wrapped in Python. Note that this wrapper does not include the CDO binary itself, but the latter is installed automatically when setting up the ClimXtract conda environment using the environment.yml file as described above. The wrapper is also compatible with any other existing CDO installation, allowing users to use a different binary if desired. Users need to specify source and target grids and can switch between interpolation methods by means of the keyword type.

3. Masking

After regridding, ClimXtract offers a masking functionality to apply a spatial domain mask from any target dataset. This ensures spatial consistency across datasets, removes unwanted edge regions, and aligns the data with the target analysis domain. In our example, masking is again based on the ÖKS15 grid, which defines the Austrian domain by means of NaN values outside of Austria. The masking step is implemented using xarray.where, making it efficient and compatible with NetCDF workflows.

As an example, masking from the HEALPix source grid [10] of high-resolution global data from DestinE to the ÖKS15 target grid is done as follows:

# apply the oeks15 spatial mask to the regridded
# destine dataset
t_mask_destine = cxt.mask(target_grid=t_oeks15[0],
                input_grid=t_regrid_destine[0],
                output_path_mask=output_path_masked)

Figure 2 illustrates how ICON model data from DestinE, shown in Figure 1, appears when regridded to the ÖKS15 grid and masked to the Austrian domain. After masking, only grid cells within the Austrian domain remain. Comparing (a) and (c) highlights how the processed ICON data aligns with the ÖKS15 domain while differing in local detail due to resolution differences, with panel (a) showing the corresponding high-resolution ÖKS15 dataset.

Example showing mean near-surface air temperature averaged for the years 2021 to 2023 after **(b)** regridding using distance-weighted interpolation followed by **(c)** applying the ÖKS15 spatial mask. Panel **(a)** shows temperature over the same period from the ÖKS15 dataset that serves as the target grid.

The same workflow can be applied to any combination of source and target grids. By harmonizing spatial resolution and geographic extent, ClimXtract enables consistent comparison between different datasets.

Quality control

ClimXtract has been tested through a combination of functional tests and example-based validation, implemented both as Jupyter Notebooks and an automated test script. The notebooks guide users through downloading datasets from different sources, regridding them to a target grid, and applying spatial masks. Each notebook concludes with a validation step that averages over time and/or space to produce visual comparisons of the selected climate variable, shown as spatial maps and timeseries (Figure 3).

Timeseries showing the near-surface air temperature averaged over the Austrian domain for September 2020. Dashed lines show observation- and reanalysis-based datasets (SPARTACUS, E-OBS, ERA5-Land), solid lines show model simulations.

To provide an automated check independent of external data access, the repository also includes a test script that uses example datasets from September 2020. This script regrids and masks subsets of the datasets described in Table 1, computes spatial means, and compares the results against pre-computed values stored in a NetCDF file. Together, the notebooks and the test script provide users with clear guidance while ensuring that ClimXtraxt produces consistent and reproducible results usable for further data processing.

(2) Availability

Operating system

Windows (64-bit) and Linux (64-Bit)

Programming language

Python version 3.10 or above.

Additional system requirements

No additional requirements.

Dependencies

Python packages: cdo, cdsapi, cf-units, cfgrib, conflator, eccodes, esgf-pyclient, lxml, matplotlib, netcdf4, numpy, polytope-client, rasterio, wget, xarray, pystac, pystac-client

List of contributors

N/A

Software location

Archive

Name: Zenodo
Persistent identifier: https://doi.org/10.5281/zenodo.17956334
Licence: GNU General Public License v3.0
Publisher: Maximilian Meindl
Version published: version 1.1.3
Date published: 16/12/2025

Code repository

Name: GitHub
Persistent identifier: https://github.com/meindlm97/ClimXtract
Licence: GNU General Public License v3.0
Date published: 31/07/2025

Language

English

(3) Reuse potential

ClimXtract was developed as part of the Austrian Climate Research Programme (ACRP) project HighResLearn. One goal of HighResLearn is to support the Austrian climate research community in efficiently accessing and processing high-resolution global climate model data alongside national-scale reference datasets such as ÖKS15. By standardizing diverse datasets onto a common grid, ClimXtract enables reproducible workflows that are essential for regional climate research and downstream applications. In particular, the toolkit lays the foundation for machine learning-based analysis of climate model performance and multi-model comparison at regional scales [11]. While initially developed for the Austrian domain and in cooperation with the klimaszenarien.at initiative for Austrian national climate change scenarios, the software is not limited to this region. All processing steps are configurable to work with any user-defined target grid, leading to high reuse potential in other countries or regional climate initiatives.

To illustrate this, the present version of ClimXtract includes support for CH2025 [7], the new national climate scenarios for Switzerland. This provides a concrete example of how the workflow can be transferred using datasets from other countries. As with any dataset, CH2025 requires a dedicated loading module (ch2025_download.py) that handles data access and naming conventions. Once such an interface is provided, the dataset can be added to the central configuration dictionary and used seamlessly throughout the workflow, for instance, as a target grid for regridding and masking. This demonstrates how users can adapt ClimXtract to additional datasets or regions.

The ClimXtract package is actively maintained as an open-source project hosted on GitHub by members of the HighResLearn team. For community support and collaboration, the project uses GitHub Issues (https://github.com/meindlm97/ClimXtract/issues) as the primary communication tool. Users are encouraged to request support, report bugs, propose enhancements, and share their own use cases. The maintainers actively monitor and respond to issues, providing technical guidance while ensuring that all discussions remain publicly visible and searchable, thereby building a shared knowledge base for the broader climate data community.

Acknowledgements

We gratefully acknowledge Geosphere Austria for providing access to the ÖKS15 and SPARTACUS dataset, and the climate data community for maintaining open-access resources such as EURO-CORDEX, E-OBS, ERA5, and Destination Earth.

ClimXtract builds on a number of open-source tools and libraries, including cdo, cdsapi, ESGF PyClient, numpy, polytope-client, wget, and xarray. We thank the respective developer communities for their contributions and continued maintenance. We also thank our colleagues and early users of ClimXtract for their valuable feedback and testing.

Competing Interests

The authors have no competing interests to declare.