cfdb¶

CF conventions multi-dimensional array storage on top of Booklet

cfdb is a pure Python database for managing labeled multi-dimensional arrays following the CF conventions. It is an alternative to netCDF4/xarray, built on Booklet for local file storage and EBooklet for S3 sync.

Key Features¶

CF conventions — coordinates, data variables, and attributes following the CF standard
Chunk-based storage — efficient compression with zstd or lz4, chunk-level read/write
Thread-safe and multiprocess-safe — thread locks and file locks for concurrent access
Rechunking — on-the-fly rechunking via rechunkit for flexible data access patterns
Parallel map — apply a function to chunks in parallel using multiprocessing
Grid interpolation — regridding, point sampling, NaN filling, and level regridding via geointerp
S3 remote sync — EDataset links a local file with an S3 remote via EBooklet
NetCDF4 export — convert to netCDF4 with h5netcdf and from netcdf4 with cfdb-ingest

Quick Example¶

import cfdb
import numpy as np

file_path = 'example.cfdb'

with cfdb.open_dataset(file_path, flag='n') as ds:
    # Create coordinates
    lat = ds.create.coord.lat(data=np.linspace(-90, 90, 181, dtype='float32'))
    lon = ds.create.coord.lon(data=np.linspace(-180, 180, 361, dtype='float32'))

    # Create a data variable
    temp = ds.create.data_var.generic(
        'temperature', ('latitude', 'longitude'), dtype='float32'
    )

    # Write data
    temp[:] = np.random.rand(181, 361).astype('float32') * 40 - 10

# Read it back
with cfdb.open_dataset(file_path) as ds:
    for chunk_slices, data in ds['temperature'].iter_chunks():
        print(chunk_slices, data.shape)

Next Steps¶

Installation — install cfdb and optional extras
Quick Start — complete walkthrough of a typical workflow
User Guide — detailed guides for every feature
API Reference — full function and class reference