Skip to content

Data Types

dtype() Factory

dtype(name, precision=None, min_value=None, max_value=None, dtype_encoded=None, offset=None, fillvalue=None)

Function to initialise a cfdb DataType. Data Types in cfdb not only describe the data type that the user's data is in, but also how the data is serialised (and encoded) to bytes.

Parameters:

Name Type Description Default
name str | dtype | DataType

The name of the data type. It can either be a string name, a np.dtype, or a DataType. If name is a string, then it must correspond to a numpy dtype for the decoding except for geometry dtypes. Geometry data types do not exist in numpy, so name must be a string of 'point', 'line', 'linestring', or 'polygon'.

required
precision int

The number of decimals of precision of the data. Only applies to DateTime and float objects. This is essentially the value that you'd pass to the round function/method. This must be passed for geometry dtypes.

None
min_value float | int | str | datetime64

The minimum possible value of the data. Along with the max_value and precision, this helps to shrink the data when serialising to bytes. Only applies to floats and DateTime dtypes and will only be used to determine the dtype encoding.

None
max_value float | int | str | datetime64

The maximum possible value of the data. See min_value for description.

None
dtype_encoded str

The np.dtype str name to be used in the encoding. Only applies to floats and DateTime dtypes and the offset and fillvalue must also be passed.

None
offset float | int

The offset when used for encoding floats and DateTime dtypes.

None
fillvalue int

The fillvalue when used for encoding floats and DateTime dtypes.

None
Notes

When the decoded dtype is a float or DateTime dtype, the data can be encoded to a smaller integer. To determine the appropriate encoding, the precision, min_value and max_value must be passed. If they are not passed, no interger encoding will be used. If the user already knows the resulting dtype_encoded, offset, and fillvalue, then these must be passed (instead of the other three mentioned above) or no integer encoding will be used.

Geometry data are encoded and decoded via WKT (and converted to/from bytes via msgpack), and currently the precision parameter is required.

Returns:

Type Description
DataType

Type Classes

All types inherit from DataType and provide dumps()/loads() for serialization.

Float

Floating-point data with optional integer encoding:

cfdb.dtypes.dtype('float32')                                    # no encoding
cfdb.dtypes.dtype('float64', precision=2)                       # rounds to 2 decimals
cfdb.dtypes.dtype('float64', precision=2, min_value=-50, max_value=100)  # integer encoding

Integer

Integer data with optional smaller encoding:

cfdb.dtypes.dtype('int32')
cfdb.dtypes.dtype('int64', min_value=0, max_value=1000)

DateTime

Numpy datetime64 data:

cfdb.dtypes.dtype('datetime64[D]')
cfdb.dtypes.dtype('datetime64[h]')

Bool

Boolean data:

cfdb.dtypes.dtype('bool')

String

Variable-length strings (msgpack serialized):

cfdb.dtypes.dtype('str')

Point / LineString / Polygon

Geometry types using shapely and WKT:

cfdb.dtypes.dtype('point', precision=6)
cfdb.dtypes.dtype('linestring', precision=4)
cfdb.dtypes.dtype('polygon', precision=4)

DataType Base Class

All types share these attributes:

Attribute Type Description
name str Type name
kind str Kind code (f=float, i=int, M=datetime, T=string, G=geometry, u=unsigned, b=bool)
itemsize int or None Bytes per element (None for variable-length)
dtype_decoded np.dtype Decoded (in-memory) numpy dtype
dtype_encoded np.dtype or None Encoded (on-disk) numpy dtype
precision int or None Decimal precision or WKT rounding
fillvalue int or None Fill value for encoded data
offset number or None Offset for encoding

Helper Function

compute_scale_and_offset

compute_scale_and_offset(min_value, max_value, dtype)

Computes the scale (slope) and offset for a dataset using a min value, max value, and the required np.dtype. It leaves one value at the lower extreme to use for the nan fillvalue. These are the min values set asside for the fillvalue (up to 64 bits). int8: -128 int16: -32768 int32: -2147483648 int64: -9223372036854775808

Unsigned integers are allowed and a value of 0 is set asside for the fillvalue.

Parameters:

Name Type Description Default
min_value int or float

The min value of the dataset.

required
max_value int or float

The max value of the dataset.

required
dtype dtype

The data type that you want to shrink the data down to.

required

Returns:

Type Description
scale, offset as floats