Skip to content

Quick Start

This guide walks through a complete cfdb workflow: creating a dataset, adding coordinates and data variables, writing data, reading data back, and exporting.

Create a Dataset

import cfdb
import numpy as np

file_path = 'quickstart.cfdb'

ds = cfdb.open_dataset(file_path, flag='n')

The flag='n' creates a new empty file (replacing any existing one). Always close the dataset when done, or use a context manager:

with cfdb.open_dataset(file_path, flag='n') as ds:
    # work with ds
    pass

Create Coordinates

Coordinates must be created before data variables. Use template methods for common dimensions:

with cfdb.open_dataset(file_path, flag='n') as ds:
    # Latitude with template method
    lat_data = np.linspace(0, 19.9, 200, dtype='float32')
    lat = ds.create.coord.lat(data=lat_data, chunk_shape=(20,))

    # Time with template method
    time_data = np.arange('2020-01-01', '2020-07-19', dtype='datetime64[D]')
    time = ds.create.coord.time(data=time_data, dtype_decoded=time_data.dtype)

    print(lat)
    print(time)

Coordinate data must be unique and sorted in ascending order. Once written, values cannot be changed — only appended or prepended.

Create a Data Variable

Data variables are linked to one or more coordinates by name:

with cfdb.open_dataset(file_path, flag='w') as ds:
    data_var = ds.create.data_var.generic(
        'temperature',
        ('latitude', 'time'),
        dtype='float32',
    )
    print(data_var)

Write Data

The simplest way to write data:

data = np.random.rand(200, 200).astype('float32') * 40

with cfdb.open_dataset(file_path, flag='w') as ds:
    ds['temperature'][:] = data

For large datasets, iterate over chunk positions:

with cfdb.open_dataset(file_path, flag='w') as ds:
    temp = ds['temperature']
    for chunk_slices in temp.iter_chunks(include_data=False):
        temp[chunk_slices] = data[chunk_slices]

Read Data

Read the full variable (only for small datasets):

with cfdb.open_dataset(file_path) as ds:
    all_data = ds['temperature'].values
    print(all_data.shape)

For large datasets, iterate over chunks:

with cfdb.open_dataset(file_path) as ds:
    temp = ds['temperature']
    for chunk_slices, chunk_data in temp.iter_chunks():
        print(chunk_slices, chunk_data.shape)

Group By

Iterate by one coordinate dimension:

with cfdb.open_dataset(file_path) as ds:
    for slices, data in ds['temperature'].groupby('latitude'):
        print(slices, data.shape)

Parallel Map

Apply a function to each chunk in parallel:

def double_values(target_chunk, data):
    return data * 2

with cfdb.open_dataset(file_path, flag='w') as ds:
    temp = ds['temperature']
    for target_chunk, result in temp.map(double_values, n_workers=4):
        temp[target_chunk] = result

The function must be a top-level function (not a lambda). Return None to skip a chunk.

Attributes

Attach JSON-serializable metadata to variables or the dataset:

with cfdb.open_dataset(file_path, flag='w') as ds:
    ds.attrs['title'] = 'Quick start example'
    ds['temperature'].attrs['units'] = 'degC'

Export to NetCDF4

Requires h5netcdf (pip install cfdb[netcdf4]):

with cfdb.open_dataset(file_path) as ds:
    ds.to_netcdf4('quickstart.nc')

Next Steps