Dataset¶
The Dataset class is the main object for interacting with a cfdb file. It acts as a dictionary of variables (coordinates and data variables). Created via open_dataset().
Usage¶
Properties¶
| Property | Type | Description |
|---|---|---|
file_path |
pathlib.Path | Path to the cfdb file |
writable |
bool | Whether the dataset is open for writing |
is_open |
bool | Whether the dataset is currently open |
compression |
str | Compression algorithm ('zstd' or 'lz4') |
compression_level |
int | Compression level |
crs |
pyproj.CRS or None | Coordinate reference system |
attrs |
Attributes | Dataset-level attributes |
create |
Creator | Variable creation interface (only when writable) |
var_names |
tuple of str | All variable names |
coord_names |
tuple of str | Coordinate variable names |
data_var_names |
tuple of str | Data variable names |
coords |
tuple | All Coordinate objects |
data_vars |
tuple | All DataVariable objects |
variables |
tuple | All Variable objects |
Dict-Like Access¶
ds['temperature'] # Get variable by name
'temperature' in ds # Check if variable exists
len(ds) # Number of variables
for name in ds: # Iterate variable names
print(name)
del ds['temperature'] # Delete a variable (writable only)
Methods¶
get(var_name)¶
Get a variable by name. Returns a Coordinate or DataVariable.
rechunker(data_vars=None)¶
Return a DatasetRechunker for multiple variables.
Parameters:
- data_vars (list of str, optional): The data variables to include. Defaults to all.
select(sel)¶
Filter the dataset by coordinate index positions. Returns a read-only DatasetView.
select_loc(sel)¶
Filter by coordinate values. Returns a read-only DatasetView.
copy(file_path, include_data_vars=None, exclude_data_vars=None)¶
Copy the dataset to a new cfdb file. Returns the new Dataset (caller must close it).
to_netcdf4(file_path, compression='gzip', include_data_vars=None, exclude_data_vars=None)¶
Export to netCDF4 format. Requires h5netcdf.
iter_chunks(chunk_shape, data_vars=None, max_mem=2**29)¶
Iterate over aligned chunks of multiple data variables. Always yields (target_chunk, var_data) where target_chunk is a dict of {coord_name: slice} and var_data is a dict of {var_name: ndarray}.
Parameters:
- chunk_shape (dict): {coord_name: int} for target chunk sizes.
- data_vars (list of str, optional): Variables to include.
- max_mem (int): Total memory budget in bytes for the entire batch. Default 512 MB.
- include_data (bool): If False, yields only {coord_name: slice} dicts without loading data. Default True.
groupby(coord_names, data_vars=None, max_mem=2**27)¶
Group by one or more coordinates across all data variables. Accepts a string, list of strings, or dict. Dict values can be int (chunk size) or str (time period like 'D', 'M', 'Y', '6h').
# Group by individual coordinate values
for target_chunk, var_data in ds.groupby('latitude'):
print(target_chunk, {k: v.shape for k, v in var_data.items()})
# Group by time period
for target_chunk, var_data in ds.groupby({'time': 'M'}, data_vars=['temperature']):
print(target_chunk, {k: v.shape for k, v in var_data.items()})
map(func, chunk_shape, data_vars=None, max_mem=2**27, n_workers=None)¶
Apply a function to aligned chunks in parallel. The function receives (target_chunk, var_data) — same as iter_chunks.
close()¶
Close the database and flush metadata to disk.
prune(timestamp=None, reindex=False)¶
Prune deleted data from the file. Returns the number of removed items.
DatasetView¶
Returned by select() and select_loc(). Provides the same read interface as Dataset but is read-only and scoped to the selection. Selections can be chained via select() and select_loc() on the view.
cfdb.main.DatasetView
¶
Bases: DatasetBase
A view of a subset of the dataset. This object is returned when a dataset is sliced/selected. It provides read-only access to the variables within the selection.
var_names
property
¶
Return a tuple of all the variables names (coord and data variables).
coord_names
property
¶
Return a tuple of all the coordinate names.
data_var_names
property
¶
Return a tuple of all the data variable names.
crs
property
¶
get(var_name)
¶
Get a variable contained within the dataset.
select(sel)
¶
Narrow this view further by coordinate positions.
select_loc(sel)
¶
Narrow this view further by coordinate values.