GH page with the entire catalogue and dashboards
This repository contains scripts to download, preprocess, standardize, and consolidate the catalogues available in the CDS.
The environment is the one used in the c3s-atlas user tools: https://github.com/ecmwf-projects/c3s-atlas/blob/main/environment.yml
| Directory | Contents |
|---|---|
| requests | Contains one CSV file per CDS catalogue, listing the requested variables, temporal resolution, interpolation method, the target save directory, and whether the variable is raw or requires post-processing to be standardized. |
| provenance | Contains one JSON file per catalogue describing the provenance and definitions of each variable. |
| scripts/download | Python scripts to download data from the CDS. |
| scripts/standardization | Python recipes to standardize the variables. |
| scripts/derived | Python recipes to calculate derived products from the variables. |
| scripts/interpolation | Python recipes to interpolate data using reference grids. |
| scripts/catalogue | Python recipes to produce the catalogues of downloaded data. |
| catalogues | CSV catalogues of datasets consolidated in Lustre or GPFS. The catalogues are updated through a nightly CI job. |
Most main folders include a local README.md with concise folder-level documentation.
The repository uses a structured directory path format to organize downloaded, derived, and interpolated data:
{base_path}/{product_type}/{dataset}/{temporal_resolution}/{interpolation}/{variable}/
Examples:
- Raw ERA5 hourly wind components:
/lustre/.../raw/reanalysis-era5-single-levels/hourly/native/u10/
Note: Interpolated data is stored under derived with the interpolation field indicating the target grid (e.g., gr006). This distinguishes it from calculated variables which use interpolation=native.
Format of the files is "{var}_{dataset}_{date}.nc" With date:
- "{year}{month}" for big datasets like CERRA saved month by month (download is faster this way).
- "{year}" for the other datasets the data is saved year by year.
Before downloading data, you can create the complete folder structure without downloading or calculating any data:
# Preview what directories would be created (dry-run mode)
python scripts/create_folder_structure.py --dry-run
# Create all directories
python scripts/create_folder_structure.pyThe script reads all CSV files in the requests/ directory and creates the directory structure according to the format:
{base_path}/{product_type}/{dataset}/{temporal_resolution}/{interpolation}/{variable}/