cfdb-ingest¶
Convert meteorological model output to cfdb with standardized CF conventions.
Overview¶
cfdb-ingest converts meteorological file formats (netCDF4/HDF5) from various model outputs into cfdb. It standardizes variable names and attributes to be consistent with CF conventions, making it straightforward to work with datasets from different sources through a single interface.
Supported Sources¶
| Source | Class | Description |
|---|---|---|
| WRF (wrfout) | WrfIngest |
All variables in one file per time range. Lambert Conformal, Polar Stereographic, Mercator, and Lat-Lon projections. |
| ERA5 (NCAR) | Era5Ingest |
One variable per file. Surface, pressure level, and invariant products on a regular lat-lon grid (EPSG:4326). |
Key Features¶
- Automatic variable mapping -- source variable names are translated to CF-standard names with proper metadata via cfdb-vars
- Named height coordinates -- surface variables at specific heights (0m, 2m, 10m, 100m) get their own named coordinates (e.g.
height_2m), coexisting cleanly with pressure-level variables - Wind rotation (WRF) -- grid-relative wind components are rotated to earth-relative
- VIMF computation (ERA5) -- native calculation of vertically integrated moisture flux from Q, U, and V
- 3D level interpolation (WRF) -- eta-level variables are interpolated to user-specified height or pressure levels
- Auto pressure level detection (ERA5) -- pressure levels are read directly from source files
- Split or combined output (ERA5) -- create one cfdb per variable or combine into a single dataset
- Soil variables -- soil moisture and temperature stored on a depth coordinate
- WPS intermediate file export -- convert cfdb datasets to WPS intermediate format for metgrid.exe
- Spatial and temporal filtering -- subset by bounding box (WGS84) and/or date range
- Multi-file support -- seamlessly spans multiple input files
- Configurable chunking -- tune output chunk shapes for different access patterns
High Performance¶
cfdb-ingest is optimized for processing high-resolution meteorological datasets:
- Vectorized processing -- transformations and vertical integrations (like VIMF) are fully vectorized using NumPy.
- Efficient I/O -- utilizes the
rechunkitengine to minimize HDF5 read operations when extracting spatial and temporal subsets. - Intelligent Caching -- manages HDF5 C-level chunk caches and per-timestep Python caches to eliminate redundant data access.
- Parallel Startup -- multi-threaded file scanning ensures fast initialization even with thousands of input files.