cfdb¶
CF conventions multi-dimensional array storage on top of Booklet
cfdb is a pure Python database for managing labeled multi-dimensional arrays following the CF conventions. It is an alternative to netCDF4/xarray, built on Booklet for local file storage and EBooklet for S3 sync.
Key Features¶
- CF conventions — coordinates, data variables, and attributes following the CF standard
- Chunk-based storage — efficient compression with zstd or lz4, chunk-level read/write
- Thread-safe and multiprocess-safe — thread locks and file locks for concurrent access
- Rechunking — on-the-fly rechunking via rechunkit for flexible data access patterns
- Parallel map — apply a function to chunks in parallel using multiprocessing
- Grid interpolation — regridding, point sampling, NaN filling, and level regridding via geointerp
- S3 remote sync —
EDatasetlinks a local file with an S3 remote via EBooklet - NetCDF4 export — convert to netCDF4 with h5netcdf and from netcdf4 with cfdb-ingest
Quick Example¶
import cfdb
import numpy as np
file_path = 'example.cfdb'
with cfdb.open_dataset(file_path, flag='n') as ds:
# Create coordinates
lat = ds.create.coord.lat(data=np.linspace(-90, 90, 181, dtype='float32'))
lon = ds.create.coord.lon(data=np.linspace(-180, 180, 361, dtype='float32'))
# Create a data variable
temp = ds.create.data_var.generic(
'temperature', ('latitude', 'longitude'), dtype='float32'
)
# Write data
temp[:] = np.random.rand(181, 361).astype('float32') * 40 - 10
# Read it back
with cfdb.open_dataset(file_path) as ds:
for chunk_slices, data in ds['temperature'].iter_chunks():
print(chunk_slices, data.shape)
Next Steps¶
- Installation — install cfdb and optional extras
- Quick Start — complete walkthrough of a typical workflow
- User Guide — detailed guides for every feature
- API Reference — full function and class reference