Rechunker¶
The Rechunker class provides on-the-fly rechunking without modifying the stored data. Access it via variable.rechunker().
cfdb.support_classes.Rechunker
¶
guess_chunk_shape(target_chunk_size)
¶
Guess an appropriate chunk layout for a dataset, given its shape and the size of each element in bytes. Will allocate chunks only as large as target_chunk_size. Chunks will be assigned to the highest composite number within the target_chunk_size. Using composite numbers will benefit the rehunking process as there is a very high likelihood that the least common multiple of two composite numbers will be significantly lower than the product of those two numbers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_chunk_size
|
int
|
The maximum size per chunk in bytes. |
required |
Returns:
| Type | Description |
|---|---|
tuple of ints
|
shape of the chunk |
rechunk(target_chunk_shape, max_mem=2 ** 27)
¶
This method takes a target chunk_shape and max memory size and returns a generator that converts to the new target chunk shape. It optimises the rechunking by using an in-memory numpy ndarray with a size defined by the max_mem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_chunk_shape
|
The chunk_shape of the target. |
required | |
max_mem
|
int
|
The max allocated memory to perform the chunking operation in bytes. This will only be as large as necessary for an optimum size chunk for the rechunking. |
2 ** 27
|
Returns:
| Type | Description |
|---|---|
Generator
|
tuple of the target slices to the np.ndarray of data |
calc_n_chunks()
¶
Calculate the total number of chunks in the existing variable.
calc_n_reads_rechunker(target_chunk_shape, max_mem=2 ** 27)
¶
Calculate the total number of reads and writes using the rechunker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_chunk_shape
|
Tuple[int, ...]
|
The chunk_shape of the target. |
required |
max_mem
|
int
|
The max allocated memory to perform the chunking operation in bytes. This will only be as large as necessary for an optimum size chunk for the rechunking. |
2 ** 27
|
Returns:
| Type | Description |
|---|---|
tuple
|
of n_reads, n_writes |
calc_ideal_read_chunk_shape(target_chunk_shape)
¶
Calculates the minimum ideal read chunk shape between a source and target.
calc_ideal_read_chunk_mem(target_chunk_shape)
¶
Calculates the minimum ideal read chunk memory between a source and target.
calc_source_read_chunk_shape(target_chunk_shape, max_mem)
¶
Calculates the optimum read chunk shape given a maximum amount of available memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_chunk_shape
|
Tuple[int, ...]
|
The target chunk shape |
required |
max_mem
|
int
|
The max allocated memory to perform the chunking operation in bytes. |
required |
Returns:
| Type | Description |
|---|---|
optimal chunk shape: tuple of ints
|
|