Data Types¶

cfdb uses a custom type system that handles both the in-memory representation (decoded) and the on-disk serialization (encoded) of data. Nearly all use numpy dtypes which can be passed instead of the cfdb dtypes. The cfdb dtypes are only necessary when the decoding is different than the encoding (e.g. float64 encoded to int16 on disk).

The dtype() Factory¶

Create a DataType using cfdb.dtypes.dtype():

import cfdb

# Simple float — no encoding, stored as-is
dt = cfdb.dtypes.dtype('float32')

# Float with precision and integer encoding
dt = cfdb.dtypes.dtype('float64', precision=2, min_value=-50.0, max_value=100.0)

# Datetime
dt = cfdb.dtypes.dtype('datetime64[D]')

# Geometry
dt = cfdb.dtypes.dtype('point', precision=6)

Parameters¶

Parameter	Type	Description
`name`	str, np.dtype, or DataType	Type name or existing dtype
`precision`	int or None	Decimal precision (floats) or WKT rounding (geometry)
`min_value`	number or None	Minimum value (for integer encoding)
`max_value`	number or None	Maximum value (for integer encoding)
`dtype_encoded`	str or None	Explicit encoded dtype name
`offset`	number or None	Offset for encoding
`fillvalue`	int or None	Fill value for encoding

Type Classes¶

Float¶

Stores floating-point data. Can optionally encode to a smaller integer for better compression:

# No encoding — stored as float32 bytes
dt = cfdb.dtypes.dtype('float32')

# With precision only — rounds to 2 decimals but stays float
dt = cfdb.dtypes.dtype('float64', precision=2)

# With integer encoding — compresses float64 to uint16 using scale+offset
dt = cfdb.dtypes.dtype('float64', precision=2, min_value=-50.0, max_value=100.0)

When min_value, max_value, and precision are provided, cfdb computes the smallest unsigned integer type and offset needed to represent the range.

Integer¶

Stores integer data. Can optionally encode to a smaller integer:

dt = cfdb.dtypes.dtype('int32')

# int64 will be encoded to int16
dt = cfdb.dtypes.dtype('int64', min_value=0, max_value=1000)

DateTime¶

Stores numpy datetime64 data:

# Days precision
dt = cfdb.dtypes.dtype('datetime64[D]')

# Hourly precision
dt = cfdb.dtypes.dtype('datetime64[h]')

# Nanosecond precision (uses int64 encoding)
dt = cfdb.dtypes.dtype('datetime64[ns]')

See the numpy datetime reference for frequency codes. Do not use frequencies finer than 'ns'.

Bool¶

Stores boolean data:

dt = cfdb.dtypes.dtype('bool')

String¶

Stores variable-length strings via msgpack serialization:

dt = cfdb.dtypes.dtype('str')

String coordinates do not need to be in ascending order.

Geometry Types¶

Geometry data uses shapely objects, serialized to WKT strings and stored via msgpack. The precision parameter controls WKT rounding:

# Point geometries with 6 decimal places
dt = cfdb.dtypes.dtype('point', precision=6)

# LineString
dt = cfdb.dtypes.dtype('linestring', precision=4)

# Polygon
dt = cfdb.dtypes.dtype('polygon', precision=4)

Encoding and Decoding¶

Each dtype handles two transformations:

encode/decode — convert between the decoded numpy dtype and the encoded dtype (e.g., float64 → uint2 using scale and offset)
dumps/loads — convert between numpy arrays and raw bytes for storage

When no encoding is configured, dumps/loads use tobytes()/frombuffer() directly.

Using with compute_scale_and_offset¶

The helper function cfdb.compute_scale_and_offset can pre-calculate encoding parameters:

from cfdb import compute_scale_and_offset

# Determine scale and offset for a data range
scale, offset = compute_scale_and_offset(min_val=-50.0, max_val=100.0, precision=2)