Data Types¶
dtype() Factory¶
dtype(name, precision=None, min_value=None, max_value=None, dtype_encoded=None, offset=None, fillvalue=None)
¶
Function to initialise a cfdb DataType. Data Types in cfdb not only describe the data type that the user's data is in, but also how the data is serialised (and encoded) to bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | dtype | DataType
|
The name of the data type. It can either be a string name, a np.dtype, or a DataType. If name is a string, then it must correspond to a numpy dtype for the decoding except for geometry dtypes. Geometry data types do not exist in numpy, so name must be a string of 'point', 'line', 'linestring', or 'polygon'. |
required |
precision
|
int
|
The number of decimals of precision of the data. Only applies to DateTime and float objects. This is essentially the value that you'd pass to the round function/method. This must be passed for geometry dtypes. |
None
|
min_value
|
float | int | str | datetime64
|
The minimum possible value of the data. Along with the max_value and precision, this helps to shrink the data when serialising to bytes. Only applies to floats and DateTime dtypes and will only be used to determine the dtype encoding. |
None
|
max_value
|
float | int | str | datetime64
|
The maximum possible value of the data. See min_value for description. |
None
|
dtype_encoded
|
str
|
The np.dtype str name to be used in the encoding. Only applies to floats and DateTime dtypes and the offset and fillvalue must also be passed. |
None
|
offset
|
float | int
|
The offset when used for encoding floats and DateTime dtypes. |
None
|
fillvalue
|
int
|
The fillvalue when used for encoding floats and DateTime dtypes. |
None
|
Notes
When the decoded dtype is a float or DateTime dtype, the data can be encoded to a smaller integer. To determine the appropriate encoding, the precision, min_value and max_value must be passed. If they are not passed, no interger encoding will be used. If the user already knows the resulting dtype_encoded, offset, and fillvalue, then these must be passed (instead of the other three mentioned above) or no integer encoding will be used.
Geometry data are encoded and decoded via WKT (and converted to/from bytes via msgpack), and currently the precision parameter is required.
Returns:
| Type | Description |
|---|---|
DataType
|
|
Type Classes¶
All types inherit from DataType and provide dumps()/loads() for serialization.
Float¶
Floating-point data with optional integer encoding:
cfdb.dtypes.dtype('float32') # no encoding
cfdb.dtypes.dtype('float64', precision=2) # rounds to 2 decimals
cfdb.dtypes.dtype('float64', precision=2, min_value=-50, max_value=100) # integer encoding
Integer¶
Integer data with optional smaller encoding:
DateTime¶
Numpy datetime64 data:
Bool¶
Boolean data:
String¶
Variable-length strings (msgpack serialized):
Point / LineString / Polygon¶
Geometry types using shapely and WKT:
cfdb.dtypes.dtype('point', precision=6)
cfdb.dtypes.dtype('linestring', precision=4)
cfdb.dtypes.dtype('polygon', precision=4)
DataType Base Class¶
All types share these attributes:
| Attribute | Type | Description |
|---|---|---|
name |
str | Type name |
kind |
str | Kind code (f=float, i=int, M=datetime, T=string, G=geometry, u=unsigned, b=bool) |
itemsize |
int or None | Bytes per element (None for variable-length) |
dtype_decoded |
np.dtype | Decoded (in-memory) numpy dtype |
dtype_encoded |
np.dtype or None | Encoded (on-disk) numpy dtype |
precision |
int or None | Decimal precision or WKT rounding |
fillvalue |
int or None | Fill value for encoded data |
offset |
number or None | Offset for encoding |
Helper Function¶
compute_scale_and_offset¶
compute_scale_and_offset(min_value, max_value, dtype)
¶
Computes the scale (slope) and offset for a dataset using a min value, max value, and the required np.dtype. It leaves one value at the lower extreme to use for the nan fillvalue. These are the min values set asside for the fillvalue (up to 64 bits). int8: -128 int16: -32768 int32: -2147483648 int64: -9223372036854775808
Unsigned integers are allowed and a value of 0 is set asside for the fillvalue.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_value
|
int or float
|
The min value of the dataset. |
required |
max_value
|
int or float
|
The max value of the dataset. |
required |
dtype
|
dtype
|
The data type that you want to shrink the data down to. |
required |
Returns:
| Type | Description |
|---|---|
scale, offset as floats
|
|