ERA5 Ingestion¶
Overview¶
Era5Ingest converts ERA5 NetCDF files from the NCAR ERA5 archive to cfdb. ERA5 files from NCAR have one variable per file, with surface products as monthly files and pressure level products as daily files.
Python API¶
Basic conversion¶
from cfdb_ingest import Era5Ingest
era5 = Era5Ingest('/path/to/era5/*.nc')
era5.convert(
cfdb_path='output.cfdb',
variables=['SP', 'VAR_2T', 'T'],
start_date='2020-01-01',
end_date='2020-01-31',
)
Input paths¶
ERA5 files can be provided as individual paths, a list, or a directory:
# Single file
era5 = Era5Ingest('e5.oper.an.sfc.128_134_sp.ll025sc.2020010100_2020013123.nc')
# Multiple files
era5 = Era5Ingest([
'e5.oper.an.sfc.128_134_sp.ll025sc.2020010100_2020013123.nc',
'e5.oper.an.pl.128_130_t.ll025sc.2020010100_2020010123.nc',
])
# Directory (finds all *.nc files)
era5 = Era5Ingest('/path/to/era5/')
# Multiple directories
era5 = Era5Ingest(['/path/to/sfc/', '/path/to/pl/'])
Spatial subsetting¶
era5.convert(
cfdb_path='output.cfdb',
variables=['SP'],
bbox=(170.0, -40.0, 175.0, -35.0), # (min_lon, min_lat, max_lon, max_lat)
)
Pressure level variables¶
Pressure levels are auto-detected from the source files. You can also specify them explicitly:
era5.convert(
cfdb_path='output.cfdb',
variables=['T', 'U', 'V'],
target_levels=[100000, 85000, 70000, 50000], # Pa
)
Combined vs split output¶
Combined (default): All variables in one cfdb file.
Split: One cfdb file per variable, written to a directory.
era5.convert(
cfdb_path='/output/dir/',
variables=['SP', 'VAR_2T', 'T'],
split=True,
)
# Creates: surface_pressure.cfdb, air_temperature.cfdb (for T)
Variable name resolution¶
variables accepts mapping keys (e.g. SP, VAR_2T, T), source variable names, or cfdb names:
era5.resolve_variables(['surface_pressure']) # ['SP']
era5.resolve_variables(['air_temperature']) # ['VAR_2T', 'T'] (both surface and pl)
era5.resolve_variables(None) # all available
Inspecting metadata¶
era5 = Era5Ingest('/path/to/era5/')
era5.crs # pyproj.CRS (always EPSG:4326)
era5.times # numpy datetime64 array (union across all files)
era5.x # longitude array
era5.y # latitude array (ascending)
era5.variables # dict of available variable mappings
era5.bbox_geographic # (min_lon, min_lat, max_lon, max_lat)
Height Coordinates¶
Surface variables are stored with named height coordinates indicating their measurement height:
| Height | Coordinate | Example variables |
|---|---|---|
| 0 m | height_0m |
SP, MSL, SSTK, CI, snow fields, albedo, all invariant vars |
| 2 m | height_2m |
VAR_2T, VAR_2D |
| 10 m | height_10m |
VAR_10U, VAR_10V, U10N, V10N |
| 100 m | height_100m |
VAR_100U, VAR_100V |
Pressure level variables use a pressure coordinate with axis='Z'. Named height coordinates do not have axis='Z' when a pressure coordinate is also present.
When a variable name conflicts between surface and pressure levels (e.g. air_temperature from both VAR_2T and T), the surface variant is suffixed: air_temperature_2m.
Geopotential Transform¶
The Z variable in ERA5 is geopotential (m2 s-2), not geopotential height (m). When ingested:
- Pressure level Z (
Z_PL) is converted togeopotential_heightby dividing by g (9.80665) - Invariant Z (
Z_INV) is converted toterrain_heightby dividing by g
VIMF Computation¶
Vertically Integrated Moisture Flux (VIMF) can be computed natively during ingestion from specific humidity (Q) and wind components (U, V).
VIMF is computed using a vectorized trapezoidal integration across all available pressure levels:
$$ VIMF = \frac{1}{g} \int q \vec{v} dp $$
Usage¶
Include VIMF_U and VIMF_V in your variables list. Ensure that the source files for Q, U, and V are included in the input paths.
era5.convert(
cfdb_path='era5_vimf.cfdb',
variables=['VIMF_U', 'VIMF_V'],
start_date='2020-01-01',
)
The resulting variables will be named vimf_u and vimf_v in the cfdb dataset, with a vertical coordinate of height_0m.
CLI¶
Basic usage¶
Options¶
| Option | Short | Description |
|---|---|---|
--variables |
-v |
Comma-separated variable names |
--split |
Create one cfdb file per variable | |
--start-date |
-s |
Start date (ISO format) |
--end-date |
-e |
End date (ISO format) |
--bbox |
-b |
Bounding box: min_lon,min_lat,max_lon,max_lat |
--target-levels |
-l |
Comma-separated pressure levels in Pa (auto-detected if omitted) |
--chunk-shape |
-c |
Output chunk shape: time,z,y,x (e.g. 1,1,50,50) |
--compression |
Compression algorithm: zstd or lz4 |
Examples¶
# Surface variables only
cfdb-ingest era5 /path/to/sfc/*.nc output.cfdb \
-v SP,VAR_2T,VAR_10U,VAR_10V
# Pressure level variables with spatial subset
cfdb-ingest era5 /path/to/pl/*.nc output.cfdb \
-v T,U,V,Q -b 170.0,-40.0,175.0,-35.0
# Combined surface + pressure level
cfdb-ingest era5 /path/to/era5/*.nc output.cfdb \
-v SP,VAR_2T,T,U,V
# Split mode: one cfdb per variable
cfdb-ingest era5 /path/to/era5/*.nc /output/dir/ --split \
-v SP,VAR_2T,T