Data Types¶

dtype() Factory¶

`dtype(name, precision=None, min_value=None, max_value=None, dtype_encoded=None, offset=None, fillvalue=None)` ¶

Function to initialise a cfdb DataType. Data Types in cfdb not only describe the data type that the user's data is in, but also how the data is serialised (and encoded) to bytes.

Parameters:

Name	Type	Description	Default
`name`	`str \| dtype \| DataType`	The name of the data type. It can either be a string name, a np.dtype, or a DataType. If name is a string, then it must correspond to a numpy dtype for the decoding except for geometry dtypes. Geometry data types do not exist in numpy, so name must be a string of 'point', 'line', 'linestring', or 'polygon'.	required
`precision`	`int`	The number of decimals of precision of the data. Only applies to DateTime and float objects. This is essentially the value that you'd pass to the round function/method. This must be passed for geometry dtypes.	`None`
`min_value`	`float \| int \| str \| datetime64`	The minimum possible value of the data. Along with the max_value and precision, this helps to shrink the data when serialising to bytes. Only applies to floats and DateTime dtypes and will only be used to determine the dtype encoding.	`None`
`max_value`	`float \| int \| str \| datetime64`	The maximum possible value of the data. See min_value for description.	`None`
`dtype_encoded`	`str`	The np.dtype str name to be used in the encoding. Only applies to floats and DateTime dtypes and the offset and fillvalue must also be passed.	`None`
`offset`	`float \| int`	The offset when used for encoding floats and DateTime dtypes.	`None`
`fillvalue`	`int`	The fillvalue when used for encoding floats and DateTime dtypes.	`None`

Notes

When the decoded dtype is a float or DateTime dtype, the data can be encoded to a smaller integer. To determine the appropriate encoding, the precision, min_value and max_value must be passed. If they are not passed, no interger encoding will be used. If the user already knows the resulting dtype_encoded, offset, and fillvalue, then these must be passed (instead of the other three mentioned above) or no integer encoding will be used.

Geometry data are encoded and decoded via WKT (and converted to/from bytes via msgpack), and currently the precision parameter is required.

Returns:

Type	Description
`DataType`

Type Classes¶

All types inherit from DataType and provide dumps()/loads() for serialization.

Float¶

Floating-point data with optional integer encoding:

cfdb.dtypes.dtype('float32')                                    # no encoding
cfdb.dtypes.dtype('float64', precision=2)                       # rounds to 2 decimals
cfdb.dtypes.dtype('float64', precision=2, min_value=-50, max_value=100)  # integer encoding

Integer¶

Integer data with optional smaller encoding:

cfdb.dtypes.dtype('int32')
cfdb.dtypes.dtype('int64', min_value=0, max_value=1000)

DateTime¶

Numpy datetime64 data:

cfdb.dtypes.dtype('datetime64[D]')
cfdb.dtypes.dtype('datetime64[h]')

Bool¶

Boolean data:

cfdb.dtypes.dtype('bool')

String¶

Variable-length strings (msgpack serialized):

cfdb.dtypes.dtype('str')

Point / LineString / Polygon¶

Geometry types using shapely and WKT:

cfdb.dtypes.dtype('point', precision=6)
cfdb.dtypes.dtype('linestring', precision=4)
cfdb.dtypes.dtype('polygon', precision=4)

DataType Base Class¶

All types share these attributes:

Attribute	Type	Description
`name`	str	Type name
`kind`	str	Kind code (f=float, i=int, M=datetime, T=string, G=geometry, u=unsigned, b=bool)
`itemsize`	int or None	Bytes per element (None for variable-length)
`dtype_decoded`	np.dtype	Decoded (in-memory) numpy dtype
`dtype_encoded`	np.dtype or None	Encoded (on-disk) numpy dtype
`precision`	int or None	Decimal precision or WKT rounding
`fillvalue`	int or None	Fill value for encoded data
`offset`	number or None	Offset for encoding

Helper Function¶

compute_scale_and_offset¶

`compute_scale_and_offset(min_value, max_value, dtype)` ¶

Computes the scale (slope) and offset for a dataset using a min value, max value, and the required np.dtype. It leaves one value at the lower extreme to use for the nan fillvalue. These are the min values set asside for the fillvalue (up to 64 bits). int8: -128 int16: -32768 int32: -2147483648 int64: -9223372036854775808

Unsigned integers are allowed and a value of 0 is set asside for the fillvalue.